SVF-tools / SVF

Static Value-Flow Analysis Framework for Source Code
http://svf-tools.github.io/SVF/
Other
1.41k stars 437 forks source link

Problems of Handling Global SVFG Nodes #991

Open seviezhou opened 1 year ago

seviezhou commented 1 year ago

I think there are some imperfections in terms of how SVF handles global SVFG Nodes and I want to seek for some suggestions.

The usage of global nodes starts from the SVFG building process. During the SaberSVFGBuilder, two kinds of SVFG nodes are considered to be global nodes:

  1. The node is a StoreSVFGNode and its destination's points-to set contains a global objects.
  2. The node is a LoadSVFGNode and its source's points-to set contains a global objects.

During the source-sink analysis, a forward analysis is first performed for each slice. During the traversal, if the destination node of an edge is a global node then the slice is not analyzed anymore since SVF assumes that the client uses the memory until the program exits.

I observe that this filtering is important because it eliminates the analysis for a large number of slicing. For example, in OpenSSL, I found that over half of all the slicing is abandoned because of this global node checking. Disable this will significantly increase the analysis time.

However, I found the global node identification is too coarse-grained because of the imprecise pointer analysis, which marks too many incorrect global nodes and potentially cause missing bugs. I made a simple example below:

#include <stdlib.h>

struct data_st* global_st;

struct data_st 
{
    char *tmp_store;
};

void* malloc_wrapper(size_t len)
{
    return malloc(len);
}

void foo(int c)
{
    struct data_st *st = (struct data_st*)malloc_wrapper(sizeof(struct data_st));
    void* ptr = malloc_wrapper(16);

    st->tmp_store = ptr;    // delete this leak will be detected

    free(st);               // no leak
    if (c > 10) free(ptr);  // partial leak
}

int main(int argc,char** argv) 
{
    struct data_st *sptr = (struct data_st*)malloc_wrapper(sizeof(struct data_st));
    global_st = sptr;       // no leak, poison malloc_wrapper to be global
    foo(argc);
    return 0;
}

Compile it and check with saber shows not bugs:

clang -S -emit-llvm -g leak.c
saber -leak leak.ll
# no bugs

But the partial leak at line 23 is supposed to be reported. The reason for this result is:

  1. In main function, the returned pointer of malloc_wrapper (more specifically, the memory object of malloc) is stored to a global pointer global_st, which makes the points-to set of global_st contain malloc. This results in all stores or loads of returned pointer from malloc_wrapper to be marked as global nodes. Therefore, st->tmp_store = ptr; is marked as a store to global variabes.
  2. In foo function, the slicing regards to ptr contains a global StoreSVFGNode, and will be discarded from source-sink analysis.
  3. sptr does not leak since it is stored to a global variable.
  4. st deos not leak since all its paths reach a free function.

There are other things I want to mentioned. First, if I deleted st->tmp_store = ptr;, the partial leak will be detected, this is easy to understand. Second, if I use malloc instead of malloc_wrapper the result is also correct, this is because the return value of malloc is not marked as alias for all call sites. Are there any ways to refine the alias results to make detected wrapper to be non-alias. Third, the root cause of this problem comes from the imprecise pointer analysis. Its effect can be amplified in larger programs and results in hundreds of missing bugs.

I attached the source code and compile IR:

global.tar.gz

yuleisui commented 1 year ago

This is a good example. Any thoughts or solutions to improve the handling of globals?

seviezhou commented 1 year ago

I think there are several directions to be considered:

  1. Obviously, a more precise pointer analysis helps by using certain level of field-sensitivity, context-sensitivity, and flow-sensitivity.
  2. We need to better incorporate the pre-analysis steps and bug detection steps. For example, for the set of functions that are detected as wrappers for memory malloc, we should make them to be no-alias at each call site. But such incorporation is not straight-forward since the built SVFG needs to be refined later, which might invalid some results.
  3. In source-sink analysis, we need a more balanced solution other than completely discarding slicing with global node or simply analyzing all slicing. One thought is to look for possible sink point during traversal, if we found one, we can ignore the global node checking.
yuleisui commented 1 year ago

Any quick solution that could fix your context-sensitive case?

seviezhou commented 1 year ago

Let me have a try.

seviezhou commented 1 year ago

Can we add a heuristic to collectGlobals such that it does not process malloc site when collecting globals? Although it will lose some soundness, it can prevent nodes that point to malloc to be added to global set. In the above example, since global_st points to malloc, so all nodes (e.g., st->tmp_store) that point to malloc are added to global set.

I found it may not be soundly solved since the pointer analysis result is imprecise so we cannot easily discriminate different nodes that point to the same malloc node.

yuleisui commented 1 year ago

Why not modify accessGlobal? Does your unsoundness mean globs may miss some globals?

seviezhou commented 1 year ago

Why not modify accessGlobal?

In accessGlobal, many nodes related to malloc have already been added to globs, so it is hard to know whether a node comes from malloc or not I think.

Does your unsoundness mean globs may miss some globals?

Based on my understanding, nodes related to the call to malloc inside the main function should actually be globals?

I want to debug it a little bit more.

seviezhou commented 1 year ago

The collectGlobals only collects three kinds of nodes: global variable node, base object inside global variables' points to set, and all fields of the base object. If we want to modify accessGlobal, we might need to find all base object and its fields inside globs.

I made am example to show the unsoundness. I add a few more lines to he previous example, there is another pointer aptr that is alias to sptr, which should be a global node. The pointer analysis result says that aptr points to malloc. If we ignore malloc together with all its fields object nodes during CollectPtsChain, the following code will report one partial leak (line 26) and one never free (line 36). But the original code reports no leaks.

#include <stdlib.h>

struct data_st *global_st;

struct data_st 
{
    void *tmp_store;
};

void* malloc_wrapper(size_t len)
{
    return malloc(len);
}

void foo(int c)
{
    struct data_st *st = (struct data_st*)malloc_wrapper(sizeof(struct data_st));
    void* ptr = malloc_wrapper(16);

    st->tmp_store = ptr;    // delete this leak will be detected

    free(st);               // no leak
    if (c > 10) free(ptr);  // partial leak
}

int main(int argc,char** argv) 
{
    struct data_st *sptr = (struct data_st*)malloc_wrapper(sizeof(struct data_st));
    global_st = sptr;       // no leak, poison malloc_wrapper to be global
    foo(argc);

    struct data_st *aptr = sptr;  // alias to global
    void* p = malloc(16);
    aptr->tmp_store = p;     // no leak

    return 0;
}
seviezhou commented 1 year ago

Here is another example. In bar the local copy of st (an alloca instruction) parameter points to both loca_st and malloc. Since we have poisoned malloc to be pointed by a global variable. So the st->tmp_store = ptr; inside bar will be added as a global store node. So the partial leak will not be reported. Any calls to bar will cause SVF losing track of the pointer passed as the second parameter.

#include <stdlib.h>

struct data_st* global_st;

struct data_st 
{
    void* tmp_store;
};

void* malloc_wrapper(size_t len)
{
    return malloc(len);
}

void bar(struct data_st* st, void* ptr)
{
    st->tmp_store = ptr;
}

void foo(int c)
{
    struct data_st loca_st;
    void* ptr = malloc_wrapper(16);
    bar(&loca_st, ptr);             // local struct data_st

    if (c > 10) free(ptr);          // partial leak
}

int main(int argc,char** argv) 
{
    struct data_st *sptr = (struct data_st*)malloc_wrapper(sizeof(struct data_st));
    global_st = sptr;               // no leak, poison malloc_wrapper to be global
    bar(sptr, NULL);

    foo(argc);

    return 0;
}
seviezhou commented 1 year ago

I make a possible patch as follows, the patch examines whether the base object is a field-insensitive object (based on my understanding, malloc node should be this type, correct me if I am wrong), then it gets the SVFValue from PAGNode (base object cannot be mapped to StmtSVFGNode). Finally, if SVFValue is a call to an external function, the empty pts is returned instead of adding all its fields:

@@ -110,6 +110,18 @@ PointsTo& SaberSVFGBuilder::CollectPtsChain(BVDataPTAImpl* pta,NodeID id, NodeTo
    else
    {
        PointsTo& pts = cachedPtsMap[baseId];
+       // base object
+       if (pta->isFIObjNode(baseId))
+       {
+           PAGNode* pn = pag->getGNode(baseId);
+           const SVFValue* value = pn->getValue();
+           const SVFCallInst* inst = SVFUtil::dyn_cast<SVFCallInst>(value);
+           if(inst && SVFUtil::isExtCall(inst))
+           {
+               return pts;
+           }
+       }

        pts |= pag->getFieldsAfterCollapse(baseId);

        WorkList worklist;

Before the patch, for the above three C code, the leak check does not report any bugs (3 false negatives). After the patch, the results are the following, there are 3 true positives (partial leaks) and 1 false positive (never free in the second program). If we want more bugs, we can use this patch, if we want to be more conservative, we should use the original code.

     PartialLeak : memory allocation at : ({ ln: 18  cl: 17  fl: global.c })
         conditional free path:
          --> ({ ln: 23  cl: 9  fl: global.c }|True)
     PartialLeak : memory allocation at : ({ ln: 18  cl: 17  fl: global.c })
         conditional free path:
          --> ({ ln: 23  cl: 9  fl: global.c }|True)

     NeverFree : memory allocation at : ({ ln: 33  cl: 15  fl: global.c })
     PartialLeak : memory allocation at : ({ ln: 23  cl: 17  fl: global.c })
         conditional free path:
          --> ({ ln: 26  cl: 9  fl: global.c }|True)
yuleisui commented 1 year ago

Is ‘ CollectPtsChain’ only used in Saber or some other places in memssa? Need to make sure this change will not affect other analyses. Maybe a new method collectNonGlobalPtsChain?

yuleisui commented 1 year ago

It would be good to consider adding an additional option user can choose to execute the saber for your change.

seviezhou commented 1 year ago

Is ‘ CollectPtsChain’ only used in Saber or some other places in memssa? Need to make sure this change will not affect other analyses. Maybe a new method collectNonGlobalPtsChain?

Memory Region Partitioning defines its own CollectPtsChain function with a same function body, so SaberSVFGBuilder::CollectPtsChain is only used in SaberSVFGBuilder.

It would be good to consider adding an additional option user can choose to execute the saber for your change.

Yes, this change definitely will make saber run slower because it needs to check more paths.