github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.69k stars 1.54k forks source link

UseAfterFree.ql miss case 00 #13896

Closed 18Fl closed 1 year ago

18Fl commented 1 year ago

Hey, When I try to learn codeql dataflow analysis from UseAfterFree.ql, I found it miss handle some case like I mentioned in github slack. I just paste it at here:

Hey, anyone.

I have some question in FlowAfterFree.ql(The file used in UseAfterFree.ql) has code snippet like this one:

        strictlyDominates(b1, i1, b2, i2)
        or
        strictlyPostDominates(b2, i2, b1, i1)

Assume we have this code:

    free(o);    //  [+] b1
    if(a == 11){
        o->x;   //  [+] b2
    }

We know If we want to hit b2, we must hit b1 before, we said b1 dominate b2.

Again. If we have this code:

    if(a == 11){
        free(o);    //  [+] b1
    }
    o->x;   //  [+] b2

There are 2 expression at here:

  1. If our data flow goes from the end of the function to the beginning. We must hit b2 before hit b1.
  2. If we hit b1, we must will hit b2 later.

We said b1 post-dominate b2.

The code snippet which I previous mentioned before use these two function to ensure we found A result must be a |Use After Free | bug. note there is |must|, not |may|.

    if(a == 11){
        free(o);
    }

    if(a == condition){    //  [+] @a
        o->x;
    }

at |@a|, we can't ensure we will definitely hit o->x, assume condition is |a == 12|, we could trigger the |UseAfterFree bug|.

The code previous work well. If I have the demo code:

//  [+] Just use this to ensure some uneasy understand question
#include <iostream>
#include <cstdlib>

class BadBox{
public:
    BadBox(int x, int y){
        x_ = x;
        y_ = y;
    }

    int get_x(){
        return x_;
    }
private:
    int x_;
    int y_;
};

int trigger_use_after_free(){
    size_t size = sizeof(BadBox);
    BadBox * fake_box = (BadBox *)malloc(size);
    free(fake_box);
    fake_box->get_x();  //  [+] obviously , It won't break the |dataflow analysis|, But how codeql to think the |side-effect|
    return 12;
}

int main(){
    int x = trigger_use_after_free();   //  [+] I could ensure this will trigger UAF BUG.
    return 0;
}

Unfortunately, It can't handle this code:

//  [+] Just use this to ensure some uneasy understand question
#include <iostream>
#include <cstdlib>

class BadBox{
public:
    BadBox(int x, int y){
        x_ = x;
        y_ = y;
    }

    int get_x(){
        return x_;
    }
private:
    int x_;
    int y_;
};

int wrapper_free(BadBox * freed_box){
    free(freed_box);
    return 0x41;    //  [+] make sure the flow been update
}

int trigger_use_after_free(){
    size_t size = sizeof(BadBox);
    BadBox * fake_box = (BadBox *)malloc(size);
    wrapper_free(fake_box); //  [+] sad dominate...
    fake_box->get_x();  //  [+] obviously , It won't break the |dataflow analysis|, But how codeql to think the |side-effect|
    return 12;
}

int main(){
    int x = trigger_use_after_free();   //  [+] I could ensure this will trigger UAF BUG.
    return 0;
}

Here is the problem:

int wrapper_free(BadBox * freed_box){
    free(freed_box);    //  [+] b1, i1
    return 0x41;    //  [+] make sure the flow been update
}

int trigger_use_after_free(){
    size_t size = sizeof(BadBox);   //  [+] b2, i2
    BadBox * fake_box = (BadBox *)malloc(size);
    wrapper_free(fake_box); //  [+] could u ensure it been dominated?
    fake_box->get_x();  //  [+] obviously , It won't break the |dataflow analysis|, But how codeql to think the |side-effect|
    return 12;
}

Obviously, |trigger_use_after_free| only has one block |b2|, and |wrapper_free| has only one block |b1|.

The relationship is |b1 of b2| or |b1 in the middle of b2|. Obviously, we can't say b1 dominate b2 or b2 post-dominate b1.

So this means we can't handle this case. But I believe this code is so classical so we will handle this.

Another way, if I recall correctly, we could say one instruction dominate another instruction?

That't means , If we have a API could help us identify one instruction dominate another instruction, we could solve this problem.

Does codeql offer a API to help us identify one instruction dominate another instruction?, THX.

18Fl commented 1 year ago

This answer is by @MathiasVP , he give me a full answer at github slack, So I just paste it at here, and I will close the issue.

@MathiasVP says:

Your analysis is completely correct. In order to keep the false-positive rate on the query low we’ve added the post-domination/domination condition. However, since post-domination and domination is based on intra-procedural control-flow analysis it’s tricky to make this condition work interprocedurally. One thing we did is to identify functions that “always dereferences a specific parameter when called”. This means that we identify things like: void f(char p) { char c = p; } void test(char p) { free(p); f(p); } even though, in the language of intraprocedural control-flow the call to free doesn’t dominate the dereference. However, the call that always dereferences p is dominates by the call to free. What we don’t do, however (and what I want to extend the query to handle) is to determine functions that behave as free and include those as sources in the query so that the domiation/post-domination works interprocedurally when the call to free is deep within a call tree. This is exactly the case you have in your final example: Since we follow flow from the call to free (and not from wrapper_free) there’s no domination/post-domination condition satisfied. However, if we identified that wrapper_free always calls free, we could use that to establish a domination/post-domination condition. Another way, if I recall correctly, we could say one instruction dominate another instruction? That’t means , If we have a API could help us identify one instruction dominate another instruction, we could solve this problem. Does codeql offer a API to help us identify one instruction dominate another instruction?, THX. Indeed, we have that. And that’s exactly what we use in the query. Given an instruction instr you can ask instr.getBlock().dominates(block) or init.getBlock().postDominates(block) to check if instr’s basic block (post)dominates another block block. However, as I said above, this is an intraprocedural analysis that needs to be made interprocedural for your examples to work.

18Fl commented 1 year ago

And Thanks for @MathiasVP awesome work!