github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.67k stars 1.54k forks source link

Taint Tracking to a LocalVariable #16438

Open mies47 opened 6 months ago

mies47 commented 6 months ago

Hello,

I'm trying my query on a simple code before moving to my main codebase. Basically, I would like to track all the local taints from all the function arguments to the LocalVariables in that function. This is the simplest version but eventually I want to find all such taints that are coming from an assignment in a loop.

Here's my simple C code:

#include <stdlib.h>
#include <stdio.h>

void call1(char* in) {
        char buff[10];
        char a;
        a = in[3];
        for (int i = 0; i < 10; i++) {
                buff[i] = in[i];
        }
        printf("%s\n", buff);
}

int main() {
        char *input = "Hello!!!!";
        call1(input);
        return 0;
}

Here's my simple CodeQL query:

from DataFlow::Node source, DataFlow::Node sink, LocalVariable lv, Function f
where
        f.getAParameter() = source.asParameter() and
        lv.getAnAccess() = sink.asExpr() and
        lv.getFunction() = f and
        TaintTracking::localTaint(source, sink)
select source, sink

I want to find the following taints: in -> buff, in -> a

For now the query returns nothing. But if I comment the TaintTracking::localTaint(source, sink) line it would return the following:

Result set: edges
| a | b |
+---+---+

Result set: nodes
| n | key | val |
+---+-----+-----+

Result set: subpaths
| arg | par | ret | out |
+-----+-----+-----+-----+

Result set: #select
| source | sink |
+--------+------+
| in     | a    |
| in     | i    |
| in     | i    |
| in     | buff |
| in     | i    |
| in     | i    |
| in     | buff |

I'm not really sure why this happens and if I should probably define an additional taint step. I'd appreciate any help.

MathiasVP commented 6 months ago

Hi @mies47

Thanks for your question!

There are a couple of problems with your query. I'll first list the issues and then suggest how you may rewrite your query to fix these.

  1. Your specify that source is the value of in. However, remember that in[3] really means *(in + 3). The fact that there is a dereference there means you need to specify that your source is whatever the parameter points to so that there's flow through the dereference.

  2. There's not really flow from the right-hand side of an assignment to the left-hand side since the left-hand side is an address (i.e., the address of a that's being written to), and the right-hand side is the value that ends up being stored in a. So sink.asExpr is probably the wrong predicate to use in this case.

Those are the two issues in your query. Now, let me propose how you might fix these issues:

  1. In order to specify that your source is what the parameter points to you can use source.asParameter(1) to specify that it's not in that's your source, but rather whatever in points to (i.e., *in, and *(in + 1), etc).

  2. In order to select the assignment to a as your sink you can use a somewhat esotetic predicate sink.asDefinition(). That predicate gives you the node that represents an assignment operation (or initializer of a declaration in case your code instead was char a = in[3]). In this case, this will give you back a AssignExpr, and you can ask for the left-hand side of this assignment to get a.

Finally, since you need to modify your sink to use sink.asDefinition you still need to account for the fact that you also want to reach buff in printf("%s\n", buff); as that's not a definition. So that needs to use sink.asIndirectExpr() to match the fact that you're selecting what the source points to.

Combining all of these points, your query becomes:

from DataFlow::Node source, DataFlow::Node sink, LocalVariable lv, Function f
where
  f.getAParameter() = source.asParameter(1) and
  lv.getAnAccess() = [sink.asDefinition().(AssignExpr).getLValue(), sink.asIndirectExpr()] and
  lv.getFunction() = f and
  TaintTracking::localTaint(source, sink)
select source, sink

I hope that helps! Let me know if you have any more questions :)

mies47 commented 6 months ago

@MathiasVP Thank you so much for your detailed explanation. This definitely works.

I was just wondering if there are any resources that explain these concepts in detail like you did :) I explored the guides and I mainly use the API reference that doesn't go into much detail.

I also used the following yesterday and added isAdditionalFlowStep which gave me buff:

module FuncParamToBuffTaint implements DataFlow::ConfigSig {
        predicate isSource(DataFlow::Node source) {
                exists(Function f, Type t |
                        f.getAParameter() = source.asParameter() and
                        source.getType() = t and
                        t instanceof PointerType
                )
        }

        predicate isSink(DataFlow::Node sink) {
                exists(ArrayExpr ae |
                        ae.getArrayBase() = sink.asExpr()
                ) or
                exists(PointerArithmeticOperation pao |
                        pao.getLeftOperand() = sink.asExpr()
                )
        }

        predicate isAdditionalFlowStep(DataFlow::Node pred, DataFlow::Node succ) {
                exists(LoadInstruction l |
                        l.getSourceAddressOperand() = pred.asOperand()
                )
        }
}

module FuncParamToBuffTaintFlow = TaintTracking::Global<FuncParamToBuffTaint>;

from DataFlow::Node source, DataFlow::Node sink
where
        source.getFunction() = sink.getFunction() and
        FuncParamToBuffTaintFlow::flow(source, sink)

select source, sink, source.getFunction()

I am trying to take source and sink as pointers. Of course this doesn't count a as sink which is fine since I'm more interested in buff. Would this work the same way as your version?