github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.53k stars 1.5k forks source link

Preserving taint through arithmetic operations in Java #14233

Open ebickle opened 1 year ago

ebickle commented 1 year ago

Description of the issue

As described in issue #4845, by default CodeQL does not propagate taint across arithmetic operations (e.g. addition) for Java.

I'm working on a query where taint tracking across arithmetic operations is required and I plan on submitting a pull request for a query bug fix. I ran into an issue or two and could use some advice on the specifics.

Background information

I added an isAdditionalFlowStep predicate to the module implementing DataFlow::ConfigSig, then extended AdditionalValueStep with a subclass that uses the ArithExpr class from semmle.code.java.arithmetic.Overflow, like this:

import semmle.code.java.arithmetic.Overflow

class ArithmeticExpressionStep extends AdditionalValueStep {
  override predicate step(DataFlow::Node node1, DataFlow::Node node2) {
    node2.asExpr().(ArithExpr).getAnOperand() = node1.asExpr()
  }
}

module MyConfig implements DataFlow::ConfigSig {
  // other predicates here

  predicate isAdditionalFlowStep(DataFlow::Node node1, DataFlow::Node node2) {
    any(AdditionalValueStep r).step(node1, node2)
  }
}

Questions

  1. Although this fixes the issue for binary arithmetic operands, unary operands still don't propagate taint. ArithExpr already checks for UnaryAssignExpr. I also tried adding the check node2.asExpr().(UnaryAssignExpr).getExpr() = node1.asExpr() but it still doesn't work - I can't quite figure out the syntax for passing taint through a unary operator using the two node1 and node2 parameters. Any ideas?

  2. Is there a better class for doing this than ArithExpr from semmle.code.java.arithmetic.Overflow? It seems like a reasonable choice, just a but strange such a low level class would be specific to overflow operations rather than generic. I wanted to verify the use was acceptable before doing a PR.

Thanks!

smowton commented 1 year ago

How about simply


class ArithmeticExpressionStep extends AdditionalValueStep {
  override predicate step(DataFlow::Node node1, DataFlow::Node node2) {
    node2.asExpr().(ArithExpr).getAnOperand() = node1.asExpr()
    or
    node2.asExpr().(UnaryExpr).getExpr() = node1.asExpr()
  }
}

Note that UnaryAssignExpr means x++, --x etc and not unary - or + (UnaryExpr includes these).

ArithExpr is fine to use, and might be a useful utility to move to Expr.qll instead of arithmetic.Overflow -- I'll note we should do this.

A few notes:

  1. Consider whether you want the bitwise binary operators &, |, ^ or the bitwise unary operator ~. Consider also unary ! and the binary logical && and ||, which don't count as ArithExpr.
  2. You're using two redundant ways to add a flow step: ConfigSig::isAdditionalFlowStep adds a step to just this configuration, while extending AdditionalValueStep adds a step to every configuration and is intended for applying site-wide customisations that apply across a query suite.
  3. Usually data-flow and taint-tracking steps are distinguished in that data-flow steps are supposed to preserve the exact value of the data being tracked (e.g., passing a value through parameters, fields, array cells, local variables), while taint-tracking steps are supposed to track related values such as concatenated strings that aren't the same but inherit taint for most purposes. Arithmetic results are more taint-like than data-flow-like, so I recommend using a taint-tracking configuration.
ebickle commented 1 year ago

Thanks!

Looks like the reason it wasn't working in all scenarios was operations like x += 1000. I had incorrectly assumed these would fall under UnaryAssignExpr. I'm starting to think there might be a bug in ArithExpr as well - it doesn't include the compound assignment operators (e.g. AssignAddExpr, AssignSubExpr, AssignMulExpr...). These are also arithmetic expressions, just ones with integrated assignment.

Looking at the usage of ArithExpr, most of it relates to overflows - all of which can occur during compound assignment as well. Am I looking at this wrong, or did I stumble across a bug?

Regarding question #2, my custom module implementing DataFlow::ConfigSig is used via (sample code) module MyFlow = TaintTracking::Global<MyConfig>. The documentation at https://github.blog/changelog/2023-08-14-new-dataflow-api-for-writing-custom-codeql-queries/ indicates that taint tracking should go inside of isAdditionalTaintStep. So I should be already good to go, right? The TaintTracking::Configuration class was deprecated last month.

smowton commented 1 year ago

Q2: Yes, that sounds good. In that case simply don't extend AdditionalValueStep -- either inline the definition of the class inside isAdditionalTaintStep, or define a predicate rather than a class.

I'll report the ArithExpr issue to the Java security team; I think your analysis is probably right.

atorralba commented 1 year ago

I agree that AFAICT there was no reason for ArithExpr to not include compound operators, so I fixed it in https://github.com/github/codeql/pull/14254. Note that it still only considers the operators +, -, *, and /, leaving bitwise and logical operators out.