github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.53k stars 1.5k forks source link

strange behavior when I use asParameter. #13921

Closed 18Fl closed 1 year ago

18Fl commented 1 year ago

Hey, again...

When I want to improve the UseAfterFree.ql, I found a strange thing which with my knowledge I can't solve it.

    void function(const BadBox & badbox_ref, BadBox * badbox_ptr){
        badbox_ptr->x;  //  [+] @a : isUse0 will find this
        badbox_ref.x;  //  [+] @b : isUse0 will find this
    }

The original |UseAfterFree.ql| want to find a flow from |param|. But it just deal with |pointer|. not reference.

The code for pointer is simple:

    private predicate flowsFromParam(DataFlow::Node n) {
        flowsToUse(n) and //  [+] @a
        (
            n.asParameter().getUnspecifiedType() instanceof PointerType //  [+] @b
            [...]
        )
    }

Explain it be simply:

  1. @a will ensure the node will finaly flow to |isUse0(badbox_ptr->x)|
  2. @b will ensure the node is a parameter.

So we could found |const BadBox * badbox_ptr|. I ensure it works.

But I want to make UseAfterFree.ql also could handle the reference case too. But I found it won't work anymore:

    flowsToUse(n) and 
    n.asParameter().getUnspecifiedType() instanceof ReferenceType //  [+] @b

This won't get any result.

But if I just use this code:

flowsToUse(n)
select n, "balabala"

I could found |select| result will catch |const BadBox & badbox_ref| case.

Again, if I use this query:

//  [+] @a
flowsToUse(n)
select n.asExpr(), "balabala"

//  [+] @b
flowsToUse(n)
select n.asParameter(), "balabala"

@a and @b both can't work. they won't catch the case |const BadBox & badbox_ref|. So maybe I should use another function liked n.asSomeFunction().get....?

However, If I use this code:

n.asParameter().getUnspecifiedType() instanceof ReferenceType  //  [+] Just use this, don't use `|flowsToUse|`
select n, n.asParameter().getUnspecifiedType().toString()

I found I could catch the case?

So what happened at here, And how can I solve it? Thx!

MathiasVP commented 1 year ago

Hi @18Fl,

I think what you're saying is that the cpp/use-after-free query isn't properly handling interprocedural uses that goes into a function which takes an argument by reference. Is that correct?

If so, that is a correct observation. As I've mentioned in other issues you've created, the query currently is mostly tuned not not flag too many results. So the query doesn't handle interprocedural uses that goes through a reference (or any another level of indirection for that matter).

It would help if you could provide a complete C/C++ program that demonstrates the alert you're looking to find, but let's just craft one to explain the current query logic. Consider this example:

void free(void*);

void use(int*& p) {
  int x = *p;
}

void test(int* p) {
  free(p);
  use(p);
}

(notice the reference parameter on use.) In order to find this use-after-free, we need to deduce that use always dereferences p twice. That is, once to go from int*& to int*, and one more time to go from int* to int (which is the dangerous dereference).

The current query logic doesn't deal with this. You've found one place where this restriction is present (i.e., the flowsFromParam predicate declares parameters of pointer types to be the only interesting ones). Another place is in the IsUse::isUse predicate where we explicitly connect an argument with the corresponding parameter (without dealing with indirections).

Does that help answer your questions?

18Fl commented 1 year ago

Hi @18Fl,

I think what you're saying is that the cpp/use-after-free query isn't properly handling interprocedural uses that goes into a function which takes an argument by reference. Is that correct?

If so, that is a correct observation. As I've mentioned in other issues you've created, the query currently is mostly tuned not not flag too many results. So the query doesn't handle interprocedural uses that goes through a reference (or any another level of indirection for that matter).

It would help if you could provide a complete C/C++ program that demonstrates the alert you're looking to find, but let's just craft one to explain the current query logic. Consider this example:

void free(void*);

void use(int*& p) {
  int x = *p;
}

void test(int* p) {
  free(p);
  use(p);
}

(notice the reference parameter on use.) In order to find this use-after-free, we need to deduce that use always dereferences p twice. That is, once to go from int*& to int*, and one more time to go from int* to int (which is the dangerous dereference).

The current query logic doesn't deal with this. You've found one place where this restriction is present (i.e., the flowsFromParam predicate declares parameters of pointer types to be the only interesting ones). Another place is in the IsUse::isUse predicate where we explicitly connect an argument with the corresponding parameter (without dealing with indirections).

Does that help answer your questions?

Hey, sorry for that I don't explain this clearly.

Here is my demo:

void UseCalculateBoxXWrapper00(const BadBox & old_box){
    UseCalculateBoxXWrapper01(old_box);
}

void UseCalculateBoxXWrapper01(const BadBox & old_box){
    UseCalculateBoxX(old_box.x_ == 0x41 ? "x_ is 0x41\n" : "x_ is not 0x41" ); // [+] @a
}

My demo use Reference not pointer. I hope I could make it work for refernce not just pointer. This is my own need.

So If I have some function like this :

void TriggerUAF(const BadBox & old_box){
    FunctionWillFreeOldBox(old_box)   //  [+] @a: assume it will freed old_box finnaly
    UseCalculateBoxXWrapper00(old_box);  // [+] @b: UAF happened
}

Now If I make codeql aware of that @b is a sink, and it post-dominate |@a|. I could obviously define it is a UAF bug. This is inspired by u had mentioined in github and slack:

However, if we identified that wrapper_free always calls free, we could use that to establish a domination/post-domination condition.

The code is classical UAF bug, because as a c++ programmer, we more use reference than pointer.

So the most important problem is how to identify the |sink|, obviously to use the UseAfterFree.ql code is good:

    //  [+] understand this is cool
    private predicate flowsFromParam(DataFlow::Node n) {
        flowsToUse(n) and //  [+] n will flow to isUse0
        (
            n.asParameter().getUnspecifiedType() instanceof PointerType //  [+] n is a parameter, and it is a pointer
            or n.asParameter().getUnspecifiedType() instanceof ReferenceType //  [+] I should have a type named reference
            or exists(DataFlow::Node prev |   
                flowsFromParam(prev)
                and DataFlow::localFlowStep(prev, n)
            )
        )
    }

The code inspired by your commit. The code seems make |asParameter| works for both case: pointer prameter and reference parameter.

So I just add this line hope it will be work.

 or n.asParameter().getUnspecifiedType() instanceof ReferenceType 

The rest part code of flowsFromParam same as UseAfterFree.ql.

I hope it work. but it didn't, the predicate flowsFromParam can't catch (const BadBox & old_box) as a parameter, so I can't identify UseCalculateBoxXWrapper00(old_box)'s old_box will be a sink node....

I want to know at least two question:

  1. why it won't work? pointer prameter and reference parameter looks same. Why |n.asParameter().getUnspecifiedType() instanceof PointerType | works, but | or n.asParameter().getUnspecifiedType() instanceof ReferenceType| won't work?
  2. how to solve this, make it work. which API I could used

Thanks!

18Fl commented 1 year ago

I know we don't need to handle all case. But pass a object in reference way instead of pointer way is so common. I don't want miss the case. maybe It will generate some false positive, But I think I could filter it by review the code.

18Fl commented 1 year ago

This will make it works, even If I still don't know why:

    predicate flowsFromParam(DataFlow::Node n) {
        flowsToUse(n) and //  [+] n will flow to isUse0
        (
            (
                n.asParameter().getUnspecifiedType() instanceof PointerType //  [+] n is a parameter, and it is a pointer
                or n.asInstruction().(InitializeParameterInstruction).getParameter().getUnspecifiedType() instanceof ReferenceType
            )
            or exists(DataFlow::Node prev |   
                flowsFromParam(prev)
                and DataFlow::localFlowStep(prev, n)
            )
        )
    }
18Fl commented 1 year ago

And here is my full code which offer reference not just pointer:

//  [+] we use this to ensure parameter case
module ParameterSinks{
    import semmle.code.cpp.ir.ValueNumbering

    //  [+] we have dataflow from n2->n1->succ, and succ isUse0 case
    private predicate flowsToUse(DataFlow::Node n) {
        isUse0(n, _)
        or
        exists(DataFlow::Node succ |
            flowsToUse(succ) and
            DataFlow::localFlowStep(n, succ)
        )
    }

    //  [+] understand this is cool
    predicate flowsFromParam(DataFlow::Node n) {
        flowsToUse(n) and //  [+] n will flow to isUse0
        (
            (
                n.asParameter().getUnspecifiedType() instanceof PointerType //  [+] n is a parameter, and it is a pointer
                or n.asInstruction().(InitializeParameterInstruction).getParameter().getUnspecifiedType() instanceof ReferenceType
            )
            or exists(DataFlow::Node prev |   
                flowsFromParam(prev)
                and DataFlow::localFlowStep(prev, n)
            )
        )
    }

    //  [+] asParam->... isUse0 : we could get anyone
    private predicate step(DataFlow::Node n1, DataFlow::Node n2) {
        flowsFromParam(n1) and
        flowsFromParam(n2) and
        DataFlow::localFlowStep(n1, n2)
    }

    //  [+] give it a alias name, FastTC(step) means step+
    private predicate paramToUse(DataFlow::Node n1, DataFlow::Node n2) = fastTC(step/2)(n1, n2)

    //  [+] ensure source is a parameter, and sink is isUse0
    private predicate hasFlow(
        DataFlow::Node source,
        DataFlow::Node sink,
        InitializeParameterInstruction init
    ) {
        paramToUse(source, sink)    //  [+] source could flow to sink
        and isUse0(sink, _)         //  [+] sink as a target
        and 
        (
            pragma[only_bind_out](source.asParameter()) = pragma[only_bind_out](init.getParameter())    //  [+] Uh... maybe this make things be unhappy
            or pragma[only_bind_out](source.asInstruction()).(InitializeParameterInstruction).getParameter() = pragma[only_bind_out](init.getParameter())    //  [+] Uh... maybe this make things be unhappy
        )
    }

    //  [+] at here, it still use one function
    //  [+] If we could hit the sink, we must hit source before.
    //  [+] Does void func will break the flow path or not?
    private InitializeParameterInstruction getAnAlwaysDereferencedParameter0(){
        exists( DataFlow::Node source, DataFlow::Node sink, IRBlock b1, IRBlock b2, int i1, int i2|   
            hasFlow(pragma[only_bind_into](source), pragma[only_bind_into](sink), result) and
            source.hasIndexInBlock(b1, pragma[only_bind_into](i1)) and
            sink.hasIndexInBlock(b2, pragma[only_bind_into](i2)) and
            strictlyPostDominates(b2, i2, b1, i1)
        )
    }

    private CallInstruction getAnAlwaysReachedCallInstruction(IRFunction f){
        result.getBlock().postDominates(f.getEntryBlock())
    }

    //  [+] get one callinstructioin's argument and f
    pragma[nomagic]
    predicate callHasTargetAndArgument(Function f, int i , CallInstruction call, Instruction argument){
        call.getStaticCallTarget() = f
        and call.getArgument(i) = argument
    }

    pragma[nomagic]
    predicate initializeParameterInFunction(Function f, int i, InitializeParameterInstruction init) {
        pragma[only_bind_out](init.getEnclosingFunction()) = f and
        init.hasIndex(i)
    }

    InitializeParameterInstruction getAnAlwaysDereferencedParameter(){
        result = getAnAlwaysDereferencedParameter0()
        //  [+] obviously, it should be recursive
        or exists(CallInstruction call, InitializeParameterInstruction p, Function f, Instruction argument, int i |   
            callHasTargetAndArgument(f, i, call, argument)
            and initializeParameterInFunction(f, i, p)
            //  [+] get yhe resuly
            and result = pragma[only_bind_out](pragma[only_bind_into](valueNumber(argument)).getAnInstruction())    
            and p = getAnAlwaysDereferencedParameter()
            //  [+] call always been called! This will eliminate too many intersting case...
            //  [+] I don't agree with this, but it's ok, I could change it simply.
            and call = getAnAlwaysReachedCallInstruction(_) 

        )
    }

}

Some stupid comment at here, which I don't remove it.