github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.61k stars 1.52k forks source link

Using CodeQL for Backward Slicing #17264

Closed KylerKatz closed 2 months ago

KylerKatz commented 2 months ago

Hello, I am trying to see if using CodeQL for backward slicing is feasible.

Currently, I am just trying a simple test like so

import java.util.logging.Logger;

public class test {
    private static final Logger logger = Logger.getLogger(test.class.getName());

    public static void main(String[] args) {
        String a = "password";
        String b = a;
        String c = b;
        String d = c;

        logger.info(d);
    }
}

With this query

/**
 * @name Backward slicing
 * @description identifies the backward slice of a sink node
 * @kind path-problem
 * @problem.severity warning
 * @id java/backward-slice
 */

import java
private import semmle.code.java.dataflow.ExternalFlow
import semmle.code.java.dataflow.TaintTracking
import semmle.code.java.dataflow.FlowSources
import CommonSinks.CommonSinks
import SensitiveInfo.SensitiveInfo

module Flow = TaintTracking::Global<SensitiveInfoInErrorMsgConfig>;

import Flow::PathGraph

/** A configuration for tracking sensitive information flow into error messages. */
module SensitiveInfoInErrorMsgConfig implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node source) {
    exists(Variable v |
      (
        v.getName() = "a" or
        v.getName() = "b" or
        v.getName() = "c" or
        v.getName() = "d"
      ) and
      source.asExpr() = v.getAnAccess()
    )
  }

  predicate isSink(DataFlow::Node sink) {
    exists(Variable v |
      (
        v.getName() = "a" or
        v.getName() = "b" or
        v.getName() = "c" or
        v.getName() = "d"
      ) and
      sink.asExpr() = v.getAnAccess()
    )
  }

}

from Flow::PathNode source, Flow::PathNode sink
where
  Flow::flowPath(source, sink) and
  // Exclude cases where the source and sink are the same node
  source.getNode().toString() != sink.getNode().toString()
select sink.getNode(), source, sink,
  "Dataflow from `" + source.getNode().toString() + "` to `" + sink.getNode().toString()

I would expect to end up with results like

a -> b -> c -> d a -> b -> c a -> b

b -> c - > d b -> c

c -> d

However, I am getting some compressed results

image

For example, I want, a ->b -> c -> d for d's full backward slice. However, I get these two instead a -> b -> d (Skipping c) a -> c -> d (Skipping b)

Is there a way to force it to get the full path? I believe it is part of an optimization step, however, I would like to always force it to get the full paths, especially in real scenarios where it can be many nodes deep.

Thank you for any help.

aschackmull commented 2 months ago

Is there a way to force it to get the full path? I believe it is part of an optimization step, however, I would like to always force it to get the full paths, especially in real scenarios where it can be many nodes deep.

Yes. If you look into the full configuration options (e.g. by jump-to-def on the ConfigSig in SensitiveInfoInErrorMsgConfig implements DataFlow::ConfigSig) then you'll see the following predicate:

    /**
     * Holds if `node` should never be skipped over in the `PathGraph` and in path
     * explanations.
     */
    default predicate neverSkip(Node node) {
      isAdditionalFlowStep(node, _) or isAdditionalFlowStep(_, node)
    }

So if you add

predicate neverSkip(Node node) { any() }

to your configuration then you'll force it to completely skip path compression.

KylerKatz commented 2 months ago

Thank you very much.