SVF-tools / SVF

Static Value-Flow Analysis Framework for Source Code
http://svf-tools.github.io/SVF/
Other
1.35k stars 432 forks source link

Taint analysis and aliases #752

Open tregua87 opened 2 years ago

tregua87 commented 2 years ago

I am trying to use SVF to build analysis, but I am not sure what SVF's component better fits my needs.

First, I describe my problem along with a couple of examples. Then, I discuss the approaches I considered.

I would understand if my inutition is correct. Specifically, what SVF components I should use (and how).

TLDR; I don't know if I need a taint analysis, a source-sink, or just an SVF graph.

Objective: I am trying to come up with an analysis that tells me what fields of a structure are accessed and how (i.e., read or written).

Taking Example 1 (below), I expect an output like this:

I believed I can achieve this by traversing the SVFG graph. However, I am afraid this is not enough.

See Example 2, the pointer s->ss is an alias of as. Following the SVGF graph, I understand that:

However, I would understand that as->field_a and as->field_b are actually accessed, and not only the s ones.

Following the SVFG, I can realize s->ss is an alias of as, but I am afraid this is not the correct way.

My plan: I guess I should come up with either a taint analysis or a source-sink analysis.
Ideally, I would traverse the execution and infer:

  1. s is used in a GEP instruction and infer the usage of a field X.
  2. X ends up in a STORE as dst (for inferring write access) / X ends up in a LOAD (for inferring read access).

Question1: thus my plan make sense?

Question2: how do implement this in SVF? In my mind, I have three approaches, but I am not sure what to follow.

1) Implementation with taint analysis: How do I use SVF for this thing? I did not find any example in the repo (or if I miss it, I kindly ask for clarification).

2) Implementation with source-sink analysis: I found the LeakaCheker, but this only checks if some variable is passing to a function argument, while I need something more "fine-grained", e.g if a pointer is used as src in a STORE instruction. Is source-sink what I am looking for?

3) Only SVFG: I might use the graph to infer which fields are read/written, and then find the aliases. Finally, propagate the read/written to all the aliases. I understood how to make this approach, but I am afraid it will lead to too many false positives.


Example 1:

typedef struct my_struct {
    int field_a;
    int field_b;
    int field_c;
} my_struct;

void fun(my_struct* s) {
    int t;
    t = s->field_a;
    s->field_a = s->field_b;
    s->field_b = t;
}

Example 2:

typedef struct my_substruct {
    int field_a;
    int field_b;
} my_substruct;

typedef struct my_struct {
    my_substruct* ss;
} my_struct;

void fun(my_struct* s, my_substruct* as) {
    s->ss = as;
    s->ss->field_a = s->ss->field_b + 10;
}
yuleisui commented 2 years ago

Hello, please see my inline replies and how they are helpful

I am trying to use SVF to build analysis, but I am not sure what SVF's component better fits my needs.

First, I describe my problem along with a couple of examples. Then, I discuss the approaches I considered.

I would understand if my inutition is correct. Specifically, what SVF components I should use (and how).

TLDR; I don't know if I need a taint analysis, a source-sink, or just an SVF graph.

Objective: I am trying to come up with an analysis that tells me what fields of a structure are accessed and how (i.e., read or written).

Taking Example 1 (below), I expect an output like this:

  • s->field_a: is read and written
  • s->field_b: is read and written
  • s->field_c: is untouched

I believed I can achieve this by traversing the SVFG graph. However, I am afraid this is not enough.

See Example 2, the pointer s->ss is an alias of as. Following the SVGF graph, I understand that:

  • s->ss->field_a: is written
  • s->ss->field_b: is read

I think you only need the alias results from points-to analysis for your purpose. The points-to targets are either base objects or fields. You can retrieve their values to understand where they are allocated.

The next step is to iterate the ICFG or SVFG whichever you prefer to see whether a pointer operand at StoreStmt or LoadStmt points to the object of your interest.

However, I would understand that as->field_a and as->field_b are actually accessed, and not only the s ones.

Following the SVFG, I can realize s->ss is an alias of as, but I am afraid this is not the correct way.

My plan: I guess I should come up with either a taint analysis or a source-sink analysis. Ideally, I would traverse the execution and infer:

  1. s is used in a GEP instruction and infer the usage of a field X.
  2. X ends up in a STORE as dst (for inferring write access) / X ends up in a LOAD (for inferring read access).

Question1: thus my plan make sense?

Question2: how do implement this in SVF? In my mind, I have three approaches, but I am not sure what to follow.

  1. Implementation with taint analysis: How do I use SVF for this thing? I did not find any example in the repo (or if I miss it, I kindly ask for clarification).
  2. Implementation with source-sink analysis: I found the LeakaCheker, but this only checks if some variable is passing to a function argument, while I need something more "fine-grained", e.g if a pointer is used as src in a STORE instruction. Is source-sink what I am looking for?
  3. Only SVFG: I might use the graph to infer which fields are read/written, and then find the aliases. Finally, propagate the read/written to all the aliases. I understood how to make this approach, but I am afraid it will lead to too many false positives.

I don't think taint analysis is the one you needed based on what you described since no source or sink is required.

Example 1:

typedef struct my_struct {
    int field_a;
    int field_b;
    int field_c;
} my_struct;

void fun(my_struct* s) {
    int t;
    t = s->field_a;
    s->field_a = s->field_b;
    s->field_b = t;
}

Example 2:

typedef struct my_substruct {
    int field_a;
    int field_b;
} my_substruct;

typedef struct my_struct {
    my_substruct* ss;
} my_struct;

void fun(my_struct* s, my_substruct* as) {
    s->ss = as;
    s->ss->field_a = s->ss->field_b + 10;
}
tregua87 commented 2 years ago

Thanks for your reply!

I believed SVFG already resolved the alias of s->ss, this seems not the case. From Example 2, I run the following command: wpa -ander -svfg --dump-vfg library.o.bc and I get this SVFG:

outfile

I don't see an alias for s->ss, while in other cases (with function pointers), the SVFG emits aliases. I am a bit confused.

For performing an alias analysis, I think I should run this here:

std::string printPts(PointerAnalysis* pta, Value* val)
{

    std::string str;
    raw_string_ostream rawstr(str);

    NodeID pNodeId = pta->getPAG()->getValueNode(val);
    const PointsTo& pts = pta->getPts(pNodeId);
    for (PointsTo::iterator ii = pts.begin(), ie = pts.end();
            ii != ie; ii++)
    {
        rawstr << " " << *ii << " ";
        PAGNode* targetObj = pta->getPAG()->getGNode(*ii);
        if(targetObj->hasValue())
        {
            rawstr << "(" <<*targetObj->getValue() << ")\t ";
        }
    }

    return rawstr.str();

}

with pta = AndersenWaveDiff::createAndersenWaveDiff(pag).

If yes, I expected SVFG already contains this info. From you message, I guess I have to run it manually.


Example 2:

typedef struct my_substruct {
    int field_a;
    int field_b;
} my_substruct;

typedef struct my_struct {
    my_substruct* ss;
} my_struct;

void fun(my_struct* s, my_substruct* as) {
    s->ss = as;
    s->ss->field_a = s->ss->field_b + 10;
}
yuleisui commented 2 years ago

Thanks for your reply!

I believe SVFG already resolved the alias of s->ss. From Example 2, I run the following command: wpa -ander -svfg --dump-vfg library.o.bc and I get this SVFG:

outfile

I don't see an alias for s->ss, while in other cases (with function pointers), the SVFG emits aliases. I am a bit confused.

For performing an alias analysis, I think I should run this here:

std::string printPts(PointerAnalysis* pta, Value* val)
{

    std::string str;
    raw_string_ostream rawstr(str);

    NodeID pNodeId = pta->getPAG()->getValueNode(val);
    const PointsTo& pts = pta->getPts(pNodeId);
    for (PointsTo::iterator ii = pts.begin(), ie = pts.end();
            ii != ie; ii++)
    {
        rawstr << " " << *ii << " ";
        PAGNode* targetObj = pta->getPAG()->getGNode(*ii);
        if(targetObj->hasValue())
        {
            rawstr << "(" <<*targetObj->getValue() << ")\t ";
        }
    }

    return rawstr.str();

}

with pta = AndersenWaveDiff::createAndersenWaveDiff(pag).

If yes, I expected SVFG already contains this info. From you message, I guess I have to run it manually.

Yes, you are on the right track!

Example 2:

typedef struct my_substruct {
    int field_a;
    int field_b;
} my_substruct;

typedef struct my_struct {
    my_substruct* ss;
} my_struct;

void fun(my_struct* s, my_substruct* as) {
    s->ss = as;
    s->ss->field_a = s->ss->field_b + 10;
}
tregua87 commented 2 years ago

Thanks again, I am getting toward my goal.

I have a slightly off-topic question. I partially resolved it with aliasing. However, in my scenario the order of the operations matters. For instance, Examples A and B (below), would have a similar outcome if just looking at alias analysis and SVFG:

However, I also need to understand which assignment happens before (with If-conditions the things get complicated). To understand the order, I was thinking to implement a post-dominator tree. Is this somehow implemented in SVF?


Example A

void fun(my_struct* s, my_substruct* as) {
    s->ss = as;
    s->ss->field_a = 10;
}

Example B

void fun(my_struct* s, my_substruct* as) {
    s->ss->field_a = 10;
    s->ss = as;
}
yuleisui commented 2 years ago

Thanks again, I am getting toward my goal.

I have a slightly off-topic question. I partially resolved it with aliasing. However, in my scenario the order of the operations matters. For instance, Examples A and B (below), would have a similar outcome if just looking at alias analysis and SVFG:

  • s->ss->field_a written
  • s->ss alias of as

However, I also need to understand which assignment happens before (with If-conditions the things get complicated). To understand the order, I was thinking to implement a post-dominator tree. Is this somehow implemented in SVF?

Not yet. It would be great if you can contribute one for dominator and post-dominance on top of SVF's ICFG.

Example A

void fun(my_struct* s, my_substruct* as) {
    s->ss = as;
    s->ss->field_a = 10;
}

Example B

void fun(my_struct* s, my_substruct* as) {
    s->ss->field_a = 10;
    s->ss = as;
}
tregua87 commented 2 years ago

Thanks for your reply. I guess I will develop some analysis. At this point, we could close this issue and eventually start another one for discussing dominators/post-dominators?

yuleisui commented 2 years ago

That’s fine. Go ahead.