Open seviezhou opened 1 year ago
This is a good example. Any thoughts or solutions to improve the handling of globals?
I think there are several directions to be considered:
Any quick solution that could fix your context-sensitive case?
Let me have a try.
Can we add a heuristic to collectGlobals
such that it does not process malloc
site when collecting globals? Although it will lose some soundness, it can prevent nodes that point to malloc
to be added to global set. In the above example, since global_st
points to malloc
, so all nodes (e.g., st->tmp_store
) that point to malloc
are added to global set.
I found it may not be soundly solved since the pointer analysis result is imprecise so we cannot easily discriminate different nodes that point to the same malloc
node.
Why not modify accessGlobal
? Does your unsoundness mean globs
may miss some globals?
Why not modify
accessGlobal
?
In accessGlobal
, many nodes related to malloc
have already been added to globs
, so it is hard to know whether a node comes from malloc
or not I think.
Does your unsoundness mean
globs
may miss some globals?
Based on my understanding, nodes related to the call to malloc
inside the main function should actually be globals?
I want to debug it a little bit more.
The collectGlobals
only collects three kinds of nodes: global variable node, base object inside global variables' points to set, and all fields of the base object. If we want to modify accessGlobal
, we might need to find all base object and its fields inside globs
.
I made am example to show the unsoundness. I add a few more lines to he previous example, there is another pointer aptr
that is alias to sptr
, which should be a global node. The pointer analysis result says that aptr
points to malloc
. If we ignore malloc
together with all its fields object nodes during CollectPtsChain
, the following code will report one partial leak (line 26) and one never free (line 36). But the original code reports no leaks.
#include <stdlib.h>
struct data_st *global_st;
struct data_st
{
void *tmp_store;
};
void* malloc_wrapper(size_t len)
{
return malloc(len);
}
void foo(int c)
{
struct data_st *st = (struct data_st*)malloc_wrapper(sizeof(struct data_st));
void* ptr = malloc_wrapper(16);
st->tmp_store = ptr; // delete this leak will be detected
free(st); // no leak
if (c > 10) free(ptr); // partial leak
}
int main(int argc,char** argv)
{
struct data_st *sptr = (struct data_st*)malloc_wrapper(sizeof(struct data_st));
global_st = sptr; // no leak, poison malloc_wrapper to be global
foo(argc);
struct data_st *aptr = sptr; // alias to global
void* p = malloc(16);
aptr->tmp_store = p; // no leak
return 0;
}
Here is another example. In bar
the local copy of st
(an alloca
instruction) parameter points to both loca_st
and malloc
. Since we have poisoned malloc
to be pointed by a global variable. So the st->tmp_store = ptr;
inside bar
will be added as a global store node. So the partial leak will not be reported. Any calls to bar
will cause SVF losing track of the pointer passed as the second parameter.
#include <stdlib.h>
struct data_st* global_st;
struct data_st
{
void* tmp_store;
};
void* malloc_wrapper(size_t len)
{
return malloc(len);
}
void bar(struct data_st* st, void* ptr)
{
st->tmp_store = ptr;
}
void foo(int c)
{
struct data_st loca_st;
void* ptr = malloc_wrapper(16);
bar(&loca_st, ptr); // local struct data_st
if (c > 10) free(ptr); // partial leak
}
int main(int argc,char** argv)
{
struct data_st *sptr = (struct data_st*)malloc_wrapper(sizeof(struct data_st));
global_st = sptr; // no leak, poison malloc_wrapper to be global
bar(sptr, NULL);
foo(argc);
return 0;
}
I make a possible patch as follows, the patch examines whether the base object is a field-insensitive object (based on my understanding, malloc
node should be this type, correct me if I am wrong), then it gets the SVFValue
from PAGNode
(base object cannot be mapped to StmtSVFGNode
). Finally, if SVFValue
is a call to an external function, the empty pts
is returned instead of adding all its fields:
@@ -110,6 +110,18 @@ PointsTo& SaberSVFGBuilder::CollectPtsChain(BVDataPTAImpl* pta,NodeID id, NodeTo
else
{
PointsTo& pts = cachedPtsMap[baseId];
+ // base object
+ if (pta->isFIObjNode(baseId))
+ {
+ PAGNode* pn = pag->getGNode(baseId);
+ const SVFValue* value = pn->getValue();
+ const SVFCallInst* inst = SVFUtil::dyn_cast<SVFCallInst>(value);
+ if(inst && SVFUtil::isExtCall(inst))
+ {
+ return pts;
+ }
+ }
pts |= pag->getFieldsAfterCollapse(baseId);
WorkList worklist;
Before the patch, for the above three C code, the leak check does not report any bugs (3 false negatives). After the patch, the results are the following, there are 3 true positives (partial leaks) and 1 false positive (never free in the second program). If we want more bugs, we can use this patch, if we want to be more conservative, we should use the original code.
PartialLeak : memory allocation at : ({ ln: 18 cl: 17 fl: global.c })
conditional free path:
--> ({ ln: 23 cl: 9 fl: global.c }|True)
PartialLeak : memory allocation at : ({ ln: 18 cl: 17 fl: global.c })
conditional free path:
--> ({ ln: 23 cl: 9 fl: global.c }|True)
NeverFree : memory allocation at : ({ ln: 33 cl: 15 fl: global.c })
PartialLeak : memory allocation at : ({ ln: 23 cl: 17 fl: global.c })
conditional free path:
--> ({ ln: 26 cl: 9 fl: global.c }|True)
Is ‘ CollectPtsChain’ only used in Saber or some other places in memssa? Need to make sure this change will not affect other analyses. Maybe a new method collectNonGlobalPtsChain?
It would be good to consider adding an additional option user can choose to execute the saber for your change.
Is ‘ CollectPtsChain’ only used in Saber or some other places in memssa? Need to make sure this change will not affect other analyses. Maybe a new method collectNonGlobalPtsChain?
Memory Region Partitioning defines its own CollectPtsChain
function with a same function body, so SaberSVFGBuilder::CollectPtsChain
is only used in SaberSVFGBuilder
.
It would be good to consider adding an additional option user can choose to execute the saber for your change.
Yes, this change definitely will make saber run slower because it needs to check more paths.
I think there are some imperfections in terms of how SVF handles global SVFG Nodes and I want to seek for some suggestions.
The usage of global nodes starts from the SVFG building process. During the
SaberSVFGBuilder
, two kinds of SVFG nodes are considered to be global nodes:StoreSVFGNode
and its destination's points-to set contains a global objects.LoadSVFGNode
and its source's points-to set contains a global objects.During the source-sink analysis, a forward analysis is first performed for each slice. During the traversal, if the destination node of an edge is a global node then the slice is not analyzed anymore since SVF assumes that the client uses the memory until the program exits.
I observe that this filtering is important because it eliminates the analysis for a large number of slicing. For example, in OpenSSL, I found that over half of all the slicing is abandoned because of this global node checking. Disable this will significantly increase the analysis time.
However, I found the global node identification is too coarse-grained because of the imprecise pointer analysis, which marks too many incorrect global nodes and potentially cause missing bugs. I made a simple example below:
Compile it and check with saber shows not bugs:
But the partial leak at line 23 is supposed to be reported. The reason for this result is:
main
function, the returned pointer ofmalloc_wrapper
(more specifically, the memory object ofmalloc
) is stored to a global pointerglobal_st
, which makes the points-to set ofglobal_st
containmalloc
. This results in all stores or loads of returned pointer frommalloc_wrapper
to be marked as global nodes. Therefore,st->tmp_store = ptr;
is marked as a store to global variabes.foo
function, the slicing regards toptr
contains a globalStoreSVFGNode
, and will be discarded from source-sink analysis.sptr
does not leak since it is stored to a global variable.st
deos not leak since all its paths reach afree
function.There are other things I want to mentioned. First, if I deleted
st->tmp_store = ptr;
, the partial leak will be detected, this is easy to understand. Second, if I usemalloc
instead ofmalloc_wrapper
the result is also correct, this is because the return value ofmalloc
is not marked as alias for all call sites. Are there any ways to refine the alias results to make detected wrapper to be non-alias. Third, the root cause of this problem comes from the imprecise pointer analysis. Its effect can be amplified in larger programs and results in hundreds of missing bugs.I attached the source code and compile IR:
global.tar.gz