How can I "sanitize" paths where a variable is passed to a sanitization function, but the path doesn't contain the result of the sanitization function?

github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security

MIT License

7.32k stars 1.47k forks source link

Update:

I'd like SsaDefinition and the dominates predicate to be better documented with some actual examples. See https://github.com/github/codeql/issues/8568#issuecomment-1084951233 for some more context.

Original Post:

I would like to remove/sanitize a path if the sink node was ever passed to the sanitization function sometime before reaching the sink. Here's an example of some C code I want to analyze:

void sink(int);
void sanitize(int);

void foo(int x) {
  sink(x);
}

void test(int x) {
  // Alert
  sink(x);

  // Alert
  foo(x);

  sanitize(x);

  // No alert
  sink(x);

  // No alert
  foo(x);
}

If I use a simple TaintTracking::Configuration which tracks flows from the parameter x to the argument to sink(x), I get four paths even if I use some type of sanitizer like this:

  override predicate isSanitizer(DataFlow::Node sanitizer) {
    exists(FunctionCall c | c.getTarget().hasName("sanitize") |
      c.getAnArgument() = sanitizer.asExpr()
    )
  }

It makes sense why this doesn't work: each path goes directly from the parameter x to the argument of sink or foo. There are no paths that go from the parameter x, to sanitize, and then to the sink (that would be the case if the example had sink(sanitize(x)), but in this case sanitize does not return a value).

I can kind of get around this issue for cases where sanitize and sink are in the function call like this:

  override predicate isSink(DataFlow::Node sink) {
    exists(FunctionCall c | c.getTarget().hasName("sink") | c.getAnArgument() = sink.asExpr()) and
    not exists(FunctionCall c |
      c.getTarget().hasName("sanitize") and
      sink.asExpr().getAPredecessor+() = c and
      c.getAnArgument().(VariableAccess).getTarget() = sink.asExpr().(VariableAccess).getTarget()
    )
  }

With this I only get three paths instead of four, but I still don't get the desired two paths because it doesn't handle the general case where the sink and sanitizer are in two different functions.

Is there any way to solve this? Does CodeQL store any paths between uses of a variables in addition to just the path from the definition of the variable to the use of it?

/** * @kind path-problem */ import cpp import semmle.code.cpp.dataflow.TaintTracking import semmle.code.cpp.controlflow.Dominance import DataFlow::PathGraph class Config extends TaintTracking::Configuration { Config() { this = "Config" } override predicate isSource(DataFlow::Node source) { exists(Function f | f.getName() = "test" and source.asParameter() = f.getAParameter()) } override predicate isSink(DataFlow::Node sink) { exists(FunctionCall call | call.getTarget().getName() = "sink" | call.getAnArgument() = sink.asExpr() ) } override predicate isSanitizer(DataFlow::Node sanitizer) { exists(FunctionCall call | call.getTarget().hasName("sanitize") | // Sanitizer variable accesses that are dominated by a call to the sanitizer // function with the same variable as an argument to the sanitizer function exists(SsaDefinition def, StackVariable v, VariableAccess dominatedUse | call.getAnArgument() = def.getAUse(v) and dominatedUse = def.getAUse(v) and dominates(call, dominatedUse) and dominatedUse = sanitizer.asExpr() ) or // Sanitize arguments to the sanitizer function call.getAnArgument() = sanitizer.asExpr() ) } } from Config c, DataFlow::PathNode source, DataFlow::PathNode sink where c.hasFlowPath(source, sink) select sink, source, sink, "Flow"

void sink(int); void sanitize(int x) { sink(x); } int transform1(int x) { return x + 1; } int transform2(int x) { return x + 2; } void foo(int x) { sink(x); } void test(int x, int y) { // Alert sink(x); // Alert foo(x); // Alert sink(y); // Alert foo(y); sanitize(x); // No alert sink(x); // No alert foo(x); // Alert sink(y); // Alert foo(y); }

github / codeql

How can I "sanitize" paths where a variable is passed to a sanitization function, but the path doesn't contain the result of the sanitization function? #8568