SVF-tools / SVF

Static Value-Flow Analysis Framework for Source Code
http://svf-tools.github.io/SVF/
Other
1.39k stars 434 forks source link

Question about VersionedFlowSensitive run time and memory usage #851

Open logo749 opened 2 years ago

logo749 commented 2 years ago

Some of our bc files run out of memory or timeout for hours during pointer analysis with VersionedFlowSensitive. So my question is: Can I guessing the space complexity and time complexity by the PAG or ICFG or other information before VersionedFlowSensitive, so I can allocate more resource for analysis or just stop the analysis?

yuleisui commented 2 years ago

I have cc'ed @mbarbar who is the author of VFS. He might have some suggestions on this.

mbarbar commented 2 years ago

A few questions,

  1. Is it possible to share some of these bc files so we can take a look at what may be happening?
  2. Can these bc files be analysed by FlowSensitive within the time/memory limit? (If it can, that means there is likely a bug somewhere in VFS.)
  3. Do you know in what stage the time out happens: during versioning or during actual points-to resolution?

Estimating time/memory usage, as far as I know, would be too conservative and probably not very useful. Maybe memory usage can be estimated for versioning phase in advance but that is not where the memory usage comes from.

Also, for time, you can increase -versioning-threads to do versioning multithreaded (-versioning-threads=4 for example). By default, it is single threaded. This won't help reduce CPU time obviously, but can help reduce wall time.

For both time and memory, you can also try the -pt-type=cbv option: it will use a more efficient data structure to back points-to sets than default.

logo749 commented 2 years ago

A few questions,

  1. Is it possible to share some of these bc files so we can take a look at what may be happening?
  2. Can these bc files be analysed by FlowSensitive within the time/memory limit? (If it can, that means there is likely a bug somewhere in VFS.)
  3. Do you know in what stage the time out happens: during versioning or during actual points-to resolution?

Estimating time/memory usage, as far as I know, would be too conservative and probably not very useful. Maybe memory usage can be estimated for versioning phase in advance but that is not where the memory usage comes from.

Also, for time, you can increase -versioning-threads to do versioning multithreaded (-versioning-threads=4 for example). By default, it is single threaded. This won't help reduce CPU time obviously, but can help reduce wall time.

For both time and memory, you can also try the -pt-type=cbv option: it will use a more efficient data structure to back points-to sets than default.

  1. Sadly I am not able to share the bc file.
  2. I replace VersionedFlowSensitive with FlowSensitive, the result is the same. Allocating more the 20G of memory kill the process. (Is it normal case that it need 20G+ memory?)
  3. By adding log, I found that the code is running in the WPASolver::solveWorklist loop.
mbarbar commented 2 years ago

Sadly I am not able to share the bc file.

Ah no problem. Even bc as small as 15 MB can use a lot of memory/time. It depends on how the program is written. And like I said, it is difficult (to my knowledge) to estimate beforehand. Some large bc are fast, some small bc are slow.

(Is it normal case that it need 20G+ memory?)

It can be normal, unfortunately. For example, in the benchmarking I performed in my dissertation which I'm cleaning up, git took 30 GB to analyse and emacs took 50 GB. More research needs to be done to reduce this. FlowSensitive would be much higher.

WPASolver::solveWorklist loop.

@yuleisui does this mean we are stuck in Andersen's?

yuleisui commented 2 years ago

I don't think it is stuck in Andersen. @logo749 did you see any stats for Andersen printed out?

logo749 commented 2 years ago

I don't think it is stuck in Andersen. @logo749 did you see any stats for Andersen printed out?

The andersen analysis is finished and the stats is printed out. The code is running in the WPASolver::solveWorklist loop of the FlowSensitive::analyze method, calling the VersionedFlowSensitive::processNode repeatedly.

mbarbar commented 2 years ago

Ahh, thank you. I suspect there isn't much we can do here without some advancement in technique, since it means the versioning is done (so no bug there), just it might be too difficult a program to currently analyse...