SVF-tools / SVF

Static Value-Flow Analysis Framework for Source Code
http://svf-tools.github.io/SVF/
Other
1.4k stars 438 forks source link

Problem on callgraph generation accelaration #1349

Open HiragiChi opened 8 months ago

HiragiChi commented 8 months ago

Hi, Thanks for providing such a useful tool. I am interested in getting a call graph (indirect call included) for large-scale programs (like Linux kernel) using Anderson analysis. However, using the raw command will exhaust the memory, resulting in a timeout.

As I am only interested in point-to results for function pointers to get the call graph, I am thinking of ways to make the icall points-to result-solving procedure more effective. believe there are many corner cases I need to handle, but the rough idea will be like this

  1. Mark the non-function objects
  2. remove these objects and the first PAG edge related to it
  3. do 2) in a recursive way. and such pruning is done after the LLVM IR is translated into SVFIR.

Do you think this idea sounds valid, or is there a better way to do this?

yuleisui commented 8 months ago

non-function objects can be holding the value-flows of pointers which point to a function object. It is not easy to separately handle them.

HiragiChi commented 8 months ago

Hi, Thanks for the reply. I think there are some non-function objects (like void pointers) that can actually point to functions or function pointers, but there are some objects and pointers (like int or char*) that cannot point to functions ( we can use bitcast analysis to confirm that). We can conservatively drop objects with these types in Anderson analysis.

 With these restrictions, do you think that sounds a valid idea?
yuleisui commented 8 months ago

How about a pointer points to a struct containing a function pointer. Similar for pointers indirectly affected

HiragiChi commented 8 months ago

Yeah, and a union type is also something that might be related to function pointers. I plan to mark these four types: void*, function pointer types, structs that contain function pointers, and unions that contain function pointers, as types that may be related to call graph construction. By pruning the other types of nodes I think we will reach higher efficiency.

Do you think this sounds a little bit more reasonable now? Any suggestions are appreciated!

yuleisui commented 8 months ago

In theory, any pointer types matter, as they could indirectly point to an object (not only struct/array/union) which may store pointers that indirectly point to a function object.

HiragiChi commented 8 months ago

Thanks for the timely reply! That is true. What I am thinking is can this process be tracked by bitcast analysis? (say fptr1 is assigned to int, so there must be a bitcast in the program that casts int to fptr, or there exists a casting chain that casts int to fptr.) In this case, we can incorporate all the int objects into Anderson analysis. I think this claim sounds solid currently. Do you think that would still miss something? Thanks!

yuleisui commented 8 months ago

You can have a try. It is hard to answer this question as this is a challenge research. Note that llvm-16 onwards does not have typed pointers which means castings will be gone for many IR instructions.

HiragiChi commented 8 months ago

Thanks for the guidance! I will try my idea with LLVM-14 and I will let you know if I have any progress :)