github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.32k stars 1.47k forks source link

How can I "sanitize" paths where a variable is passed to a sanitization function, but the path doesn't contain the result of the sanitization function? #8568

Open gsingh93 opened 2 years ago

gsingh93 commented 2 years ago

Update:

I'd like SsaDefinition and the dominates predicate to be better documented with some actual examples. See https://github.com/github/codeql/issues/8568#issuecomment-1084951233 for some more context.


Original Post:

I would like to remove/sanitize a path if the sink node was ever passed to the sanitization function sometime before reaching the sink. Here's an example of some C code I want to analyze:

void sink(int);
void sanitize(int);

void foo(int x) {
  sink(x);
}

void test(int x) {
  // Alert
  sink(x);

  // Alert
  foo(x);

  sanitize(x);

  // No alert
  sink(x);

  // No alert
  foo(x);
}

If I use a simple TaintTracking::Configuration which tracks flows from the parameter x to the argument to sink(x), I get four paths even if I use some type of sanitizer like this:

  override predicate isSanitizer(DataFlow::Node sanitizer) {
    exists(FunctionCall c | c.getTarget().hasName("sanitize") |
      c.getAnArgument() = sanitizer.asExpr()
    )
  }

It makes sense why this doesn't work: each path goes directly from the parameter x to the argument of sink or foo. There are no paths that go from the parameter x, to sanitize, and then to the sink (that would be the case if the example had sink(sanitize(x)), but in this case sanitize does not return a value).

I can kind of get around this issue for cases where sanitize and sink are in the function call like this:

  override predicate isSink(DataFlow::Node sink) {
    exists(FunctionCall c | c.getTarget().hasName("sink") | c.getAnArgument() = sink.asExpr()) and
    not exists(FunctionCall c |
      c.getTarget().hasName("sanitize") and
      sink.asExpr().getAPredecessor+() = c and
      c.getAnArgument().(VariableAccess).getTarget() = sink.asExpr().(VariableAccess).getTarget()
    )
  }

With this I only get three paths instead of four, but I still don't get the desired two paths because it doesn't handle the general case where the sink and sanitizer are in two different functions.

Is there any way to solve this? Does CodeQL store any paths between uses of a variables in addition to just the path from the definition of the variable to the use of it?

gsingh93 commented 2 years ago

Thanks to @rvermeulen, I find that this solution works well:

/**
 * @kind path-problem
 */

import cpp
import semmle.code.cpp.dataflow.TaintTracking
import semmle.code.cpp.controlflow.Dominance
import DataFlow::PathGraph

class Config extends TaintTracking::Configuration {
  Config() { this = "Config" }

  override predicate isSource(DataFlow::Node source) {
    exists(Function f | f.getName() = "test" and source.asParameter() = f.getAParameter())
  }

  override predicate isSink(DataFlow::Node sink) {
    exists(FunctionCall call | call.getTarget().getName() = "sink" |
      call.getAnArgument() = sink.asExpr()
    )
  }

  override predicate isSanitizer(DataFlow::Node sanitizer) {
    exists(FunctionCall call | call.getTarget().hasName("sanitize") |
      // Sanitizer variable accesses that are dominated by a call to the sanitizer
      // function with the same variable as an argument to the sanitizer function
      exists(SsaDefinition def, StackVariable v, VariableAccess dominatedUse |
        call.getAnArgument() = def.getAUse(v) and
        dominatedUse = def.getAUse(v) and
        dominates(call, dominatedUse) and
        dominatedUse = sanitizer.asExpr()
      )
      or
      // Sanitize arguments to the sanitizer function
      call.getAnArgument() = sanitizer.asExpr()
    )
  }
}

from Config c, DataFlow::PathNode source, DataFlow::PathNode sink
where c.hasFlowPath(source, sink)
select sink, source, sink, "Flow"

It works on this test case:

void sink(int);
void sanitize(int x) { sink(x); }

int transform1(int x) { return x + 1; }
int transform2(int x) { return x + 2; }

void foo(int x) { sink(x); }

void test(int x, int y) {
  // Alert
  sink(x);

  // Alert
  foo(x);

  // Alert
  sink(y);

  // Alert
  foo(y);

  sanitize(x);

  // No alert
  sink(x);

  // No alert
  foo(x);

  // Alert
  sink(y);

  // Alert
  foo(y);
}

I'd like to change the outcome of this issue from "how to do this" to better documentation and examples for how to use SsaDefinition and the dominates predicate.