SVFG: Calling a function twice with global pointers as a parameter

Symen commented 7 years ago

Hi, I'm currently working with your awesome SVF analysis. During some tests with the SVFG, I found a simple test-case that doesn't make sense to me.

Why does the SVFG in Test1 has a path from "local_x" into the second "init" call?

int *global_p1; // NodeID= 1
int *global_p2; // NodeID= 2

void init(int **pp, int *x) {
    *pp = x;
}

void delete(int **pp) {
    *pp = NULL;
}

void test1() {
    int local_x = 1; // NodeID= 10 
    int local_y = 2; // NodeID= 11

    init(&global_p1, &local_x);
    init(&global_p2, &local_y);
    delete(&global_p2);
}

void test2() {
    int local_x = 1;// NodeID= 10
    int local_y = 2;// NodeID= 11
    int *local_p1;  // NodeID= 12
    int *local_p2;  // NodeID= 13

    init(&local_p1, &local_x);
    init(&local_p2, &local_y);
    delete(&local_p2);
}

int main() {

    /*** Test 1:
     * There will be a path from the "local_x" into the second call of "init",
        * (10 -> 60 -> 20 -> 16 -> 21 -> 53 -> 21 -> ...)
     * but "local_x" was only written into "global_p1" and not into "global_p2".
     */ 
    //test1();

    /*** Test 2:
     * Writing "local_x" into "local_p1" creates the SVFG, which I would
     * have expected also in Test1.
     */ 
    //test2();
    return 0;
}

These are the result I get with: saber -leak -dump-svfg main.bc SVFG of Test1: saber_svfg_test1

SVFG of Test2: saber_svfg_test2

yuleisui commented 7 years ago

Hi Symen,

Good question!

The differences between your two tests are due to our inter-procedural Mod-Ref analysis. Please refer to line 346-369 in MemRegion.cpp (https://github.com/unsw-corg/SVF/blob/master/lib/MSSA/MemRegion.cpp).

Any global objects and the objects they point to inside a callee function, say F, will be promoted as a side-effect to the callsites that call F (transitively along the call chain).

We treat global objects (e.g., "global_p1" ) and objects (e.g., "local_x") on its Andersen's points-to graph conservatively in a context-insensitive manner, so in your case, "local_x" will be put as a side-effect before the two "init" callsites.

You may wish to take a look at the "SVFG_before_opt" file to see those side-effect objects inside the yellow rectangle boxes.

Thanks

Symen commented 7 years ago

Thanks for the explanation!

Would a context-sensitive SVFG be possible or is there another data-structure in order to check, if such a path (like in the example above) is possible?

yuleisui commented 7 years ago

The current Mod-Ref analysis supports partial context-sensitivity. Full context-sensitive Mod-Ref analysis, including handling globals, requires a context-sensitive pointer analysis. SVF does not support it yet. You may wish to write one by leveraging the existing code in MemRegion.cpp.

Another way is to write a pass to refine the SVFG e.g., implementing an optimization pass like SVFGOPT.

We have a recent paper describing how to refine the spurious value-flows. Please refer to http://www.cse.unsw.edu.au/~corg/supa/.

Symen commented 7 years ago

I am not very experienced with the implementation of static analyses, but I will give it a try. Which of the two approaches would you prefer? My guess would be that the leverage of the MemRegion.cpp might be a bit easier. Or am I wrong?

Great paper by the way! This should be helpful.

yuleisui commented 7 years ago

Either writing a pass before (MemRegion.cpp) or after (SVFGOPT.cpp) creating SVFG is doable.

If you have already taken a look at the Mod-Ref analysis, MemRegion.cpp is a good starting point.

If you still want to optimize the graph once it is built, such as removing edges and nodes, then SVFGOPT.cpp is also a good example to follow.

Good luck!

Symen commented 7 years ago

Unfortunately, I had no time for it the last weeks, but I tried to understand the code in detail. As you mentioned, the framework requires a context-sensitive pointer analysis. I found the CondPTAImpl class (PointerAnalysis.h) which seems to be an approach of a context-sensitive analysis. Wouldn't it be necessary to leverage the Andersen's pointer analysis as you described in this paper Making context-sensitive inclusion-based pointer analysis practical for compilers using parameterised summarisation?

I also was wondering whether SVF allows the implementation of sound analyses, or if there is some implementation detail, that causes analyses to be unsound in principle?

Thanks again

yuleisui commented 7 years ago

On context-sensitivity

Yes, CondPTAImpl is designed for context- or path-sensitive analysis. Every pointer or abstract object is qualified by a condition e.g., (call stack or bdd for path-sensivitiy). To understand more about how to use the data structures in CondPTAImpl, you may wish to refer to “DDA/ContextDDA.h/cpp” in SUPA’s implementation.

Our whole-program context-sensitive inclusion-based pointer analysis was originally implemented in Open64 compiler. We don’t have time to migrate it to SVF:) But the SPE paper gives all the algorithms and evaluation details of the analysis. Yes, it is definitely possible to implement one in SVF.

On soundness

Short answer: Yes, SVF allows sound analysis if every LLVM instruction (relevant to pointer analysis) is modeled in a conservative way.

Long story: soundness in static analysis generally means over-approximations of a program’s behavior. Many research papers claim their approaches are sound by modeling and analyzing the programs (e.g., in the form of inference rules) written in their self-defined tiny languages. However, under real-world setting, a tiny language may be only a subset of a modern intermediate representation, like LLVM IR.

To soundly model the program behavior, SVF need to handle as many instructions as possible (refer to https://github.com/unsw-corg/SVF/wiki/Technical-documentation#123-pagbuilder) to generate high-level representations (e.g., graphs). Another challenge for sound analysis is to model the side-effects of library calls (refer to https://github.com/unsw-corg/SVF/blob/master/lib/Util/ExtAPI.cpp). Of course, C programs which are not ANSI-compliant also pose challenge to soundness (e.g., pointer arithmetic underflow/overflow).

SVF-tools / SVF

SVFG: Calling a function twice with global pointers as a parameter #10