github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.67k stars 1.54k forks source link

False negative: java string concatenation using StringBuilder causing injection #12939

Open ebickle opened 1 year ago

ebickle commented 1 year ago

Description of the issue

CodeQL correctly detected a Java code issue that involved Java string concatenation being passed to Runtime.getRuntime().exec(""). In our case it wasn't exploitable, but it was still dangerous code that needed to be remediated.

Code samples The code (not mine!) looked a bit like like this: Process p = Runtime.getRuntime().exec("python " + SCRIPT_PATH + " --file " + inputFile.getAbsolutePath());

The developer attempted to correct the issue on their own and changed it to this:

StringBuilder pythonPath = new StringBuilder();
pythonPath.append("python ").append(SCRIPT_PATH).append(" --file ").append(inputFile.getAbsolutePath());
Process p = Runtime.getRuntime().exec(pythonPath.toString());

CodeQL closed the alert as fixed, even though the new code is still vulnerable and logically the same.

It appears as though CodeQL is not properly detecting Java string concatenation vulnerabilities whenever the StringBuilder class is used. StringBuilder is fairly common, so this results in some risky omissions.

CodeQL Rule Rule ID: java-concatenated-command-line

Build-Time Environment Language: java Java Distribution: temurin Java Version: 8 (via setup-java@v3) Java Build Tool: Maven 3.8.2

hmakholm commented 1 year ago

The java/concatenated-command-line query looks for string concatenation in particular. To find user-controlled flows that reach command lines through more complex paths, java/command-line-injection or java/command-line-injection-local, ought to be able to alert here this, assuming that inputFile can be identified as being user-controlled value.

ebickle commented 1 year ago

I've created a public repo with a reproduction of the issue here: https://github.com/ebickle/codeql-process-concatenation-test

Cloning it should run CodeQL and only show one error, missing the StringBuilder. The rule triggers without any user-controlled input.

hmakholm commented 1 year ago

Thanks for the example. It has spawned a bit of internal discussion of what CodeQL's reaction to such code should be. We'll let you know when a conclusion emerges from that.

ebickle commented 1 year ago

Thanks!

If it helps, the original code looks a bit like this (modified to protect the innocent 😁):

// Elsewhere
private static File TMP_DIR = new File("/tmp");

File inputFile = File.createTempFile("some_path", "tmp", TMP_DIR);
// Write file contents here
Process p = Runtime.getRuntime().exec("python " + SCRIPT_PATH + " --image_file " + inputFile.getAbsolutePath());

There are better code patterns that would avoid the temporary file, but that's what CodeQL detected. In this case there's no user-controlled input (createTempFile only uses constants), but the overall coding pattern is less than stellar and I'm not a fan of it.

The person that wrote the fix (using StringBuilder) didn't read the CodeQL fix documentation and incorrectly assumed it was literally the string concatenation using + that was the problem. Concatenation using other means doesn't escape parameters either, but then you quickly go down a rabbit hole of just about all process execution using parameters being security-sensitive.

Not sure what the best fix is, but one option would be to have concatenation via StringBuilder be detected and then having a rule for user-supplied input going to process execution, then another for concatenation-flowing-to-exec in general - I think the SQL injection rules work that way, at least for JavaScript?

I'm no expert though, just tossing out random ideas 😃

atorralba commented 1 year ago

Hey, member of the CodeQL Java team here.

In general, we try to keep queries as precise as possible: that means, we try to not have false negatives (i.e. we want to alert on as many real issues as possible) nor false positives (i.e. we don't want to alert on false/non-exploitable issues).

That's why our high-precision, robust queries java/command-injection and java/command-injection-local, which can be used depending on your threat model —that is, depending on whether you consider local inputs (like the content of a local file) as dangerous or not—, only alert when user-controllable data reaches a sink.

Now, java/concatenated-command-line is a different kind of query. It isn't making sure that the issue is exploitable — just that a string is being built dynamically by concatenation, and that it should be reviewed to make sure that all parts of the string are trustworthy. That's why the query doesn't alert on constants being used in the concatenation, but only the dynamically-looking parts.

Of course, we could consider the use of StringBuilder to be a different kind of concatenation, and add it to the query. But it goes beyond its current intended use, which is highlighting these simple "this looks wrong" patterns that developers should review in detail. From my perspective, in this kind of queries it's acceptable that developers mark alerts as false positives after review and move on. When more detailed, dangerous flow is detected in the dataflow-based queries, then exploitability is way more probable, and a correction action is almost always needed.

Which brings me to your sentence:

It appears as though CodeQL is not properly detecting Java string concatenation vulnerabilities whenever the StringBuilder class is used. StringBuilder is fairly common, so this results in some risky omissions.

Concatenation by itself is not a vulnerability, just a pattern that historically we decided to alert about if it was used in something as sensitive as command execution in a very obvious way (which doesn't imply exploitability). For more reliable, precise analysis, the dataflow queries should be used, and they definitely take flow through StringBuilder into account.

A different issue would be that java/concatenated-command-line is currently tagged as precision high, which shouldn't be the case given its simplistic analysis.

ebickle commented 1 year ago

That makes a lot of sense @atorralba and I appreciate the highly detailed reply! Very insightful into the inner workings and design of CodeQL and the built-in queries.

I suspected that there was a more appropriate query for dataflow-based analysis (e.g. java/command-injection). Figured I'd send a heads up over to the team just in case the same underlying "string concatenation" constructs were used there as well - e.g. if StringBuilder somehow is causing false negatives in multiple queries due to it being missing at lower level somewhere. I can review the queries, but my knowledge of the deeper plumbings of CodeQL isn't as good 🙂

Feel free to close out this issue if you'd like; I'll leave it open for now just in case the team would prefer to use it to track a potential change to the ruleset tagging of java/concatenated-command-line. Happy either way!