github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.67k stars 1.54k forks source link

Java: Taint flows backwards for array element #15321

Open Marcono1234 opened 10 months ago

Marcono1234 commented 10 months ago

Version

CodeQL extension version: 1.12.0 
CodeQL CLI version: 2.15.5 
Platform: win32 x64

Dependencies:

Description

It looks like taint tracking erroneously reports that taint flows "backwards" from an array element assignment. For example:

void arrayAssign() {
    String[] s = {"a"};
    s[0] = source(sink(s[0]));
}

Here it erroneously reports that there is taint flow from source to sink, even though sink is actually executed before source, so there should not be any flow between them.

Here is also real world code where this occurs: apache/logging-log4j2 (my query considered the output of String.format as source, and an argument to String.format as sink)

Steps to reproduce

Java code:

class FlowTest {
    <T> T source(T o) {
        return o;
    }

    <T> T sink(T o) {
        return o;
    }

    void varAssign() {
        String s = "a";
        s = source(sink(s));
    }

    void arrayAssign() {
        String[] s = {"a"};
        // Erroneously reports flow from `source` to `sink`
        s[0] = source(sink(s[0]));
    }
}

CodeQL query:

/**
 * @kind problem
 */

import java
import semmle.code.java.dataflow.DataFlow
import semmle.code.java.dataflow.TaintTracking

class Source extends DataFlow::Node {
  Source() {
    exists(Method m |
      m = this.asExpr().(MethodCall).getMethod().getSourceDeclaration() and
      m.hasName("source") and
      m.fromSource()
    )
  }
}

class Sink extends DataFlow::Node {
  Sink() {
    exists(MethodCall call, Method m |
      call.getAnArgument() = this.asExpr() and
      m = call.getMethod().getSourceDeclaration() and
      m.hasName("sink") and
      m.fromSource()
    )
  }
}

from Source source, Sink sink, string type
where
  type = "dataflow" and DataFlow::localFlow(source, sink)
  or
  type = "taint" and TaintTracking::localTaint(source, sink)
select sink, type + " from $@", source, "source"
smowton commented 10 months ago

Investigated this a little: the cause is our control-flow graph reflecting the order of events according to the JLS:

If any of the expressions in an array assignment have side-effects, the ordering is:

first()[second()] = third();

However of course once all the necessary arguments are evaluated, the last step is to assign the RHS value to the array cell, so our reading that data will flow from the RHS (source) to the array assignment (store step) to the use of the array on the RHS (apparent subsequent use step) is not a valid grounds for propagating taint in that order.

Note that source consuming data from sink is not important: this example works too:

class FlowTest {
    String source() {
        return "tainted";
    }

    String sink(String arg) {
        return arg;
    }

    void arrayAssign() {
        String[] s = {"a"};
        // Erroneously reports flow from `source` to `sink`
        s[0] = source() + sink(s[0]);
    }
}

Here the flow is source() -> op+ -> s[0] -> s (lhs use) -> s (rhs use, correctly reflecting the order of the JVM's references to s), -> s[0] -> sink

rvermeulen commented 3 weeks ago

Hi @Marcono1234,

Thanks again for your report. I believe @smowton answered your question so I'm proceeding by closing this issue.

If you have any follow-up questions, feel free to re-open this issue.

Thanks!

Marcono1234 commented 3 weeks ago

@rvermeulen, if I understood @smowton correctly they mainly acknowledged that this is a bug and explained the reason (implementation using JLS evaluation order instead of taint flow order):

so our reading that data will flow from the RHS (source) to the array assignment (store step) to the use of the array on the RHS (apparent subsequent use step) is not a valid grounds for propagating taint in that order

This is labelled as https://github.com/github/codeql/labels/question because there is no issue template for https://github.com/github/codeql/labels/bug; one of the maintainers has to set it manually it seems.


feel free to re-open this issue

I can't: If someone other than the author closes an issue, then the author cannot reopen it again (unless they are a member of the repository).

Could you please reopen this? This issue still occurs for CodeQL 2.19.1 (codeql/java-all 4.1.0) with the original reproduction steps provided above.

aschackmull commented 3 weeks ago

Reopened. You're correct that this is still an issue - it's just not a very high priority, as it's probably not a large source of FP flow, and It's a bit finicky to fix. But it's noted and appreciated, so if I find some spare time for something like this, then I might have a go at fixing it.

rvermeulen commented 3 weeks ago

@Marcono1234 excuses, I wasn't aware it couldn't be reopened.

Thanks @aschackmull for the follow-up. I will assign the bug label and keep it open even though it might not end up being addressed.