OWASP / CheatSheetSeries

The OWASP Cheat Sheet Series was created to provide a concise collection of high value information on specific application security topics.
https://cheatsheetseries.owasp.org
Creative Commons Attribution Share Alike 4.0 International
27.97k stars 3.91k forks source link

Update: XSS Cheatsheet Confuses reader due to interchangeable use of escape/encode #438

Closed cdwijayarathna closed 4 years ago

cdwijayarathna commented 4 years ago

We identified this issue in a usability evaluation we conducted for OWASP ESAPI [1]. In this study, programmers used the ESAPI to fix some XSS vulnerabilities of an application and reported usability issues that they encounter.

Out of 10 participants, 1 participant mentioned that he was confused by the interchangeable use of words escape and encode. We had a discussion with Kevin Wall from ESAPI team as well and it was suggested that some improvement is required to address this issue in the cheatsheet, so it will be less confusing for a user who is new to XSS.

Furthermore, it is important to notice that while some users are familiar with escape wording, some are available with encode wording. So, it is required to facilitate both these groups as well.

[1]. https://arxiv.org/abs/1810.01017

mackowski commented 4 years ago

@cdwijayarathna thank you for your contribution. Encoding and escaping are two similar but at the end different techniques. I like the definition from Proactive controls: https://owasp.org/www-project-proactive-controls/v3/en/c4-encode-escape-data. I read XSS Cheatsheet and I saw a that it is not always 100% strict with definitions. Do you want to spot the places in the cheatsheet where these terms are used incorrectly?

cdwijayarathna commented 4 years ago

While referring to the definition you shared made me feel that 'encoding' is the most suitable word to be used in most places, our results don't say they have been used 'incorrectly'. It is rather a usability related issue since the use of two words has been inconsistent. It confuses the reader, especially those who have little knowledge on XSS.

jmanico commented 4 years ago

Encoding and Escaping are essentially synonyms in the industry of security engineering.  For XSS defense, technically you need to use HTML Entity /encoding/ for the HTML locations (converting < to <) of and for JavaScript and CSS contexts the technique is actually hex /escaping/ with slashes (in JS you convert a ' to a \'). But the terms are often used interchangeably.

So for XSS defense, one needs to do a combination of escaping and encoding and the two terms essentially do the same thing, neutralize content so it will not execute.

So the general term is output encoding, even when you use JavaScript escaping.

So please be careful about fixing these terms. Technically we should be "escaping or encoding specific to each context" and the "Output encoding" technique is really "Output encoding OR escaping based on the content".

And to be pedantic, anyone saying that we only need output encoding for XSS defense is flat out wrong for several reasons:

  1. It's actually output encoding and escaping
  2. Even URL's put in HREF attributes that are encoded properly still execute JS with javascript: links
  3. Other techniques like HTML sanitization are needed for a complete XSS defense strategy

-- Jim Manico Cheatsheet Project Lead/Founder

On 7/3/20 2:47 AM, mackowski wrote:

@cdwijayarathna https://github.com/cdwijayarathna thank you for your contribution. Encoding and escaping are two similar but at the end different techniques. I like the definition from Proactive controls: https://owasp.org/www-project-proactive-controls/v3/en/c4-encode-escape-data. I read XSS Cheatsheet and I saw a that it is not always 100% strict with definitions. Do you want to spot the places in the cheatsheet where these terms are used incorrectly?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OWASP/CheatSheetSeries/issues/438#issuecomment-653531977, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEBYCLTSEB6FAPVMU2MLE3RZXHPJANCNFSM4OPYRXPA.

jmanico commented 4 years ago

Careful here.

Technically you need a combo of escaping and encoding. Both do the same thing: neutralize content to stop injection in a web page.

Output encoding is not the proper blanket term even if it is used in popular security engineering discourse.

"Output encoding or escaping" is the more accurate term. "Output encoding" as a singular term to describe XSS defense is not complete or fully accurate.

mackowski commented 4 years ago

I fully agree with Jim.

cdwijayarathna commented 4 years ago

I think the use of "Output encoding" instead of "Output encoding or escaping" might be the reason for the confusion of the users that we observed in our experiment. I am trying to highlight the perspective of an amateur user who is new to terms such as XSS, encoding and escaping.

An example is, in most rules, it is advised to 'escape' untrusted data. However, in an example in Rule 3.1, it says "Do NOT do this without encoding the data ". These are the kind of confusion that we observed.

You may have already taken these into account when designing the this already. Since, I don't have any solution to the confusion we observed, it may introduce more issues than solving issues if any changes are done.

So now since I have informed you about our observations, I'll let you decide whether any improvements are required or not.

ThunderSon commented 4 years ago

I will update my comment later. I felt like I added to the confusion. Once I have a much precise answer that properly answers the above, I'll post it.

jmanico commented 4 years ago

I agree with Thunderson. You identified a real problem with the cheatsheet and our sloppy use of these terms.

I hope you decide to help us fix it somehow! The eyes of a new reader are especially valuable to those of us who have read it many times. So thank you for your contribution so far. :)

Jim Manico @Manicode

On Jul 4, 2020, at 12:33 AM, ThunderSon notifications@github.com wrote:

 You may have already taken these into account when designing the this already.

Refrain from such thoughts. Yes the team (and all previous contributors) take the proper time to investigate and write things out, it doesn't mean they're golden words. You are presenting a very important piece. Please keep pushing it to the end without thinking that whoever wrote this is all-knowing. Everyone makes mistakes :)

Escaping is a sub-usage of encoding. You are escaping content let's say in a SQL query. It's what the language is expecting as charset, which is back to encoding on the SQL level. That's why escaping differs between languages, in the end we're encoding to a certain Context. One could say "Let's encode this query so that it uses the allowed charset, where we can use \ to have a larger charset when needed", but that's just me 😄

@cdwijayarathna Would you be interested to look into this on the CS level and we'll help the contribution forward? You don't need to worry if it feels daunting. That's how contributions start :) No pressure if you don't feel like doing it. The team can pick it up later, but we love having new faces.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

ThunderSon commented 4 years ago

You may have already taken these into account when designing the this already.

Refrain from such thoughts. Yes the team (and all previous contributors) take the proper time to investigate and write things out, it doesn't mean they're golden words. You are presenting a very important piece. Please keep pushing it to the end without thinking that whoever wrote this is all-knowing. Everyone makes mistakes :)

Encoding is more to how you're going to represent the characters. In other words, what's the character set in play here? Escaping is more to creating an alternative interpretation to certain metacharacters in that character set. It relies on the context it's being used in. A context might look at escaping in a different way than another context. Character sets are character sets. Agreed upon.

Example: "Some philosopher said "Something fancy here", and I agree with it" HTML Encoding: "Some philosopher said &quot;Something fancy here&quot;, and I agree with it" -> & is used as an escape character, yet this is HTML encoding. All characters (<, >, ", ', etc.) will have to be encoded in the context. It's what HTML use. That's the format being used. String Escaping: "Some philosopher said \"Something fancy here\", and I agree with it" -> I want to escape from the HTML string context. Only " will be escaped because it's in the " " context.

As you see, they're heavily intertwined.

Does this make sense, or does this feel off? Please feel free to oppose what I am saying, or add on it.

@cdwijayarathna Would you be interested to look into this on the CS level and we'll help the contribution forward? You don't need to worry if it feels daunting. That's how contributions start :) No pressure if you don't feel like doing it. The team can pick it up later, but we love having new faces.

cdwijayarathna commented 4 years ago

I think now I have a better idea of the difference between encoding and escaping. I think the first step for approaching this would be to forget the usability and confusion, and verify if there are usages of escape/encode words that are inconsistent with these definitions. I will carefully go through the cheatsheet and update you on what I come up with.

I agree with @jmanico on his concern about changing these words. While the current usage of words may confuse novice readers, changing it would confuse readers who are familiar with the existing wording.

elarlang commented 4 years ago

Just pointing out, that you may use encoding for JavaScript as well.

Example: "Some philosopher said "Something fancy here", and I agree with it" Encoded: "Some philosopher said \u0022Something fancy here\u0022, and I agree with it"

Useful, when someone decides to put userinput to some HTML on* attribute (like onlick, onmouseover etc), because there you can not handle with only converting for HTML (leaves JavaScript injection vector) or escaping for JavaScript (leavas HTML injection vector).

jmanico commented 4 years ago

Please note that in this example JavaScript escaping and JavaScript encoding can be used interchangeably for the exact same defensive benefit. This is why the two terms are often used interchangeably.

On 7/9/20 6:10 AM, Elar Lang wrote:

Just pointing out, that you may use encoding for JavaScript as well.

Example: |"Some philosopher said "Something fancy here", and I agree with it"| Encoded: |"Some philosopher said \u0022Something fancy here\u0022, and I agree with it"|

Useful, when someone decides to put userinput to some HTML on* attribute (like onlick, onmouseover etc), because there you can not handle with only converting for HTML (leaves JavaScript injection vector) or escaping for JavaScript (leavas HTML injection vector).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OWASP/CheatSheetSeries/issues/438#issuecomment-656217251, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEBYCPPWFJ2LIR5O2ILGSTR2XTWJANCNFSM4OPYRXPA.

-- Jim Manico Manicode Security https://www.manicode.com

elarlang commented 4 years ago

for the exact same defensive benefit.

This part is not correct (for every context) - they don't provide exactly the same benefit.

Example:

Input: demo"><svg onload=alert(1)>

Example 1 - JavaScript escaping

Input after escaping: demo\"><svg onload=alert(1)>

Output: <element onclick="sendsomewhere('demo\"><svg onload=alert(1)>');">

Result: vulnerable, even " is escaped \", it finishes "onclick" attribute value for <elemenet> and <svg> is interpreted as new element

Example 2 - JavaScript encoding (probably not correct name for encoding)

Input after encoding: demo\u0022\u003e\u003csvg onload=alert(1)\u003e

Output: <element onclick="sendsomewhere('demo\u0022\u003e\u003csvg onload=alert(1)\u003e');">

Resut: everything works like expected.

cdwijayarathna commented 4 years ago

I started reviewing the cheatsheet. I need some feedback on the following phrase under Rule 1.

"Escape the following characters with HTML entity encoding"

I feel that the use of word escape here is inconsistent with the definitions presented above. At the same time, I don't feel very confident about it. What do you think?

rbsec commented 4 years ago

@cdwijayarathna that should definitely be "encode" rather than "escape".

jmanico commented 4 years ago

Agreed!

-- Jim Manico @Manicode

On Jul 9, 2020, at 7:54 AM, rbsec notifications@github.com wrote:

 @cdwijayarathna that should definitely be "encode" rather than "escape".

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

jmanico commented 4 years ago

This is not the correct escaping, you are missing my point.

What I’m saying is that ••when escaping is available•• it provides the same defense as encoding.

In your example escaping double quotes is not available in JS. In other contexts it is.

-- Jim Manico @Manicode

On Jul 9, 2020, at 6:33 AM, Elar Lang notifications@github.com wrote:

 for the exact same defensive benefit.

This part is not correct - they don't provide exactly the same benefit.

Example:

Application: Input: demo">

Example 1 - JavaScript escaping

Output: ');">

Result: vulnerable, even " is escaped \", it finishes "onclick" attribute value for and is interpreted as new element

Example 2 - JavaScript encoding (probably not correct name for encoding)

Output:

Resut: everything works like expected.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

jmanico commented 4 years ago

Elar,

Reading this again, I think you are right when it comes to web contexts, the point of this cheatsheet.

Here is a great read on the topic.

https://www.onwebsecurity.com/security/properly-encoding-and-escaping-for-the-web.html

So I think an easy fix for the cheatsheet is to just replace all uses of escaping with encoding for the XSS cheatsheets.

On 7/9/20 6:33 AM, Elar Lang wrote:

for the exact same defensive benefit.

This part is not correct - they don't provide exactly the same benefit.

Example:

  • Application: ||

Input: |demo">|

Example 1 - JavaScript escaping

Output: |<svg onload=alert(1)>');">|

Result: vulnerable, even |"| is escaped |\"|, it finishes "onclick" attribute value for || and || is interpreted as new element

Example 2 - JavaScript encoding (probably not correct name for
encoding)

Output: ||

Resut: everything works like expected.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OWASP/CheatSheetSeries/issues/438#issuecomment-656229051, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEBYCKXXQG7K6COG4TSXRDR2XWNDANCNFSM4OPYRXPA.

-- Jim Manico Manicode Security https://www.manicode.com

jmanico commented 4 years ago

My suggestions on the CS:

This article provides a simple positive model for preventingXSS https://owasp.org/www-community/attacks/xss/using output escaping/encoding properly.

Drop "escaping/" from this sentence.

Bothreflected and stored XSS https://owasp.org/www-community/attacks/xss/#stored-and-reflected-xss-attackscan be addressed by performing the appropriate validation and escaping on the server-side.DOM Based XSS https://owasp.org/www-community/attacks/DOM_Based_XSScan be addressed with a special subset of rules described in theDOM based XSS Prevention Cheat Sheet https://cheatsheetseries.owasp.org/cheatsheets/DOM_based_XSS_Prevention_Cheat_Sheet.html.

Replace with encoding

When you put untrusted data into these slots, you need to take certain steps to make sure that the data does not break out of that slot into a context that allows code execution. In a way, this approach treats an HTML document like a parameterized database query - the data is kept in specific places and is isolated from code contexts with escaping.

Replace with encoding

For example, you might be tempted to use some of the escaping shortcuts like|\"|in JavaScript.

Good use of it, since it's a negative example.

Except for alphanumeric characters, escape all characters less than 256 with the|\xHH|format to prevent switching out of the data value into the script context or into another attribute.DO NOTuse any escaping shortcuts like|\"|because the quote character may be matched by the HTML attribute parser which runs first. These escaping shortcuts are also susceptible toescape-the-escape attackswhere the attacker sends|\"|and the vulnerable code turns that into|\"|which enables the quote.

Replace first use of escape with encode

If this works for you I can keep going....

On 7/9/20 6:47 AM, Chamila Wijayarathna wrote:

I started reviewing the cheatsheet. I need some feedback on the following phrase under Rule 1.

"Escape the following characters with HTML entity encoding"

I feel that the use of word escape here is inconsistent with the definitions presented above. At the same time, I don't feel very confident about it. What do you think?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OWASP/CheatSheetSeries/issues/438#issuecomment-656235899, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEBYCMVKD5UFXATKPRY4T3R2XYD3ANCNFSM4OPYRXPA.

kwwall commented 4 years ago

Elar, Reading this again, I think you are right when it comes to web contexts, the point of this cheatsheet. Here is a great read on the topic. https://www.onwebsecurity.com/security/properly-encoding-and-escaping-for-the-web.html So I think an easy fix for the cheatsheet is to just replace all uses of escaping with encoding for the XSS cheatsheets. - Jim ... -- Jim Manico Manicode Security https://www.manicode.com

@jmanico - It was actually that link you referenced where I found the first definitions of Escape vs Encode. I started this whole conversation with @cdwijayarathna as I was going through the Javadoc of the ESAPI Encoder interface after reading their research paper and discussing it with them on Google Drive. Also, the two of us, plus Chamila's advisor, had a long extended conversation specifically about this confusing "escape" vs. "encode" terminology in a long private email thread before it ended up here.

After reading their paper and this pnwebsecurity.com's definition of Escapings vs Encoding in the link you referenced, I could see why inexperienced people were confused because we use encodForXyz() for naming all the methods regardless as to whether they are using escaping, encoding or both. I've tried to clean some of that up and not just that but other things based on their very insightful feedback.

I'm getting ready for a new ESAPI release shortly (this weekend if I can get to it), but you can see the new Javadoc for the 2.2.1.0-RC1 release for the ESAPI Encoder here if you are interested: https://www.javadoc.io/static/org.owasp.esapi/esapi/2.2.1.0-RC1/org/owasp/esapi/Encoder.html

Feedback is welcome!

I think there is still a lot of room for improvement and to some degree, it really needs to be supplemented with a Encoder User Guide or at least a wiki page on the GitHub wiki, but I think it addresses a lot of the issues brought up in the paper that @cdwijayarathna and his colleague authored.

cdwijayarathna commented 4 years ago

I also feel this works. So, if I understood this correctly, all rules described in the XSS cheatsheet are based on encoding rather than escaping.

My suggestions on the CS: This article provides a simple positive model for preventingXSS https://owasp.org/www-community/attacks/xss/using output escaping/encoding properly. Drop "escaping/" from this sentence. Bothreflected and stored XSS https://owasp.org/www-community/attacks/xss/#stored-and-reflected-xss-attackscan be addressed by performing the appropriate validation and escaping on the server-side.DOM Based XSS https://owasp.org/www-community/attacks/DOM_Based_XSScan be addressed with a special subset of rules described in theDOM based XSS Prevention Cheat Sheet https://cheatsheetseries.owasp.org/cheatsheets/DOM_based_XSS_Prevention_Cheat_Sheet.html. Replace with encoding When you put untrusted data into these slots, you need to take certain steps to make sure that the data does not break out of that slot into a context that allows code execution. In a way, this approach treats an HTML document like a parameterized database query - the data is kept in specific places and is isolated from code contexts with escaping. Replace with encoding For example, you might be tempted to use some of the escaping shortcuts like|\"|in JavaScript. Good use of it, since it's a negative example. Except for alphanumeric characters, escape all characters less than 256 with the|\xHH|format to prevent switching out of the data value into the script context or into another attribute.DO NOTuse any escaping shortcuts like|\"|because the quote character may be matched by the HTML attribute parser which runs first. These escaping shortcuts are also susceptible toescape-the-escape attackswhere the attacker sends|\"|and the vulnerable code turns that into|\"|which enables the quote. Replace first use of escape with encode If this works for you I can keep going.... - Jim On 7/9/20 6:47 AM, Chamila Wijayarathna wrote: I started reviewing the cheatsheet. I need some feedback on the following phrase under Rule 1. "Escape the following characters with HTML entity encoding" I feel that the use of word escape here is inconsistent with the definitions presented above. At the same time, I don't feel very confident about it. What do you think? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#438 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEBYCMVKD5UFXATKPRY4T3R2XYD3ANCNFSM4OPYRXPA.

kwwall commented 4 years ago

@cdwijayarathna - I think that's probably true of the XSS Prevention Cheat Sheet, but it is not true of (at least) the ESAPI Encoder. It has methods like encodeForLDAP() and encodeForOS() that only rely on escaping, not encoding (in the sense of how it is described inhttps://www.onwebsecurity.com/security/properly-encoding-and-escaping-for-the-web.html). Those are not to provide defenses for XSS though so are probably irrelevant in the discussion of this cheat sheet, but still could have caused confusion for your the subject of your ESAPI Encoder usability study because those methods are part of the Encoder interface.

cdwijayarathna commented 4 years ago

Hello all, I created a PR #449 where I did some improvements based on this discussion.

Mostly, I changed the word 'escape'/'escaping' to 'encode'/'encoding' according to the definition at https://owasp.org/www-project-proactive-controls/v3/en/c4-encode-escape-data . I left some 'escape/escaping' words where I felt they followed the definition.

I was not sure about wording at line number 226, 245 and 383, so I'd like to hear your feedback on those lines.

Let me know what you think.

jmanico commented 4 years ago

This looks good to me. Can we push this live? Well done!

Aloha, Jim

On 7/21/20 6:54 AM, Chamila Wijayarathna wrote:

Hello all, I created a PR #449 https://github.com/OWASP/CheatSheetSeries/pull/449 where I did some improvements based on this discussion.

Mostly, I changed the word 'escape'/'escaping' to 'encode'/'encoding' according to the definition at https://owasp.org/www-project-proactive-controls/v3/en/c4-encode-escape-data . I left some 'escape/escaping' words where I felt they followed the definition.

I was not sure about wording at line number 226, 245 and 383, so I'd like to hear your feedback on those lines.

Let me know what you think.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OWASP/CheatSheetSeries/issues/438#issuecomment-661979751, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEBYCJ6DTEYRCMGRC7SHS3R4XB3ZANCNFSM4OPYRXPA.

-- Jim Manico Manicode Security https://www.manicode.com

mackowski commented 4 years ago

Looks good! I merged it. Thank you @cdwijayarathna