github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.32k stars 1.47k forks source link

CodeQL XSS False Positive when using ESAPI.encoder().encodeForHTML() to defend against XSS #16531

Open davewichers opened 1 month ago

davewichers commented 1 month ago

I originally reported this here: "CodeQL XSS False Positives and XSS AutoFix incorrect location for defensive encoding" (https://github.com/orgs/community/discussions/122802), but am reporting it here because I was told this is a better place for it.

CodeQL Issue:

I'd like to report an XSS false positive for CodeQL AND the AutoFix for it.

If you look at the CodeQL XSS finding in my test project here: https://github.com/aspectsecurity/TestGitHubAdvSec/security/code-scanning/8

It is saying there is an XSS when writing the filename parameter out to the response, and YET that parameter value is wrapped in a call to: org.owasp.esapi.ESAPI.encoder().encodeForHTML(), which makes is safe.

AutoFix issue related to this CodeQL Issue:

I also have the same code in a private repo, where CodeQL is reporting this same XSS, and the AutoFix suggestion is to change this:

+ org.owasp.esapi.ESAPI.encoder().encodeForHTML(fileName));

To this:

+ org.owasp.esapi.ESAPI.encoder().encodeForHTML(org.owasp.esapi.ESAPI.encoder().encodeForHTML(fileName)));

Which is clearly redundant/duplicative.

A similar XSS false positive is also reported here: https://github.com/aspectsecurity/TestGitHubAdvSec/security/code-scanning/9, where it is encoding the value of 'param' on line 72.

But the autofix for this same file in my private repo, suggests this change:

60: param = java.net.URLDecoder.decode(theCookie.getValue(), "UTF-8"); (Original code)

Suggested fix:

60: param = org.owasp.benchmark.helpers.Utils.encodeForHTML(java.net.URLDecoder.decode(theCookie.getValue(), "UTF-8"));

The problem with FIXING XSS like this way earlier than the point where the parameter value is echoed in the web response is that encoding it like this can BREAK the use of that parameter between this line and where it echoed in the web response.

In this test case 4, the value is put into session and could be used anywhere, so encoding it BEFORE it is put into session can have serious adverse effects.

In another similar test case, the same autofix is suggested to be applied to a filename parameter before the filename is ever used. HTML Encoding a filename parameter can definitely screw up the use of that filename.

The RIGHT way to do XSS defenses is ONLY encode the value just BEFORE it is included in the web response. Encoding it too early can easily break stuff.

Here's a Stack Exchange discussion on why you should encode as late as possible: https://security.stackexchange.com/questions/261081/is-there-a-consensus-on-whether-html-encoding-should-happen-upon-upload-or-retri

"Encoding/escaping should happen as late as possible." The reason for this is you have to know the HTML/JavaScript context to know what the right type of encoding to use is, and encoding too early corrupts the data for other uses that have nothing to do with using those variable values outside of a browser context.

jketema commented 1 month ago

Hi @davewichers,

Thank you for the false positive report. Resolving false positives is not a current product priority, so from that perspective we acknowledge the report and will track it internally for future consideration, or if we observe repeated instances of the same problem. However, I've looped in the autofix team, as the auto fixes are clearly problematic.