SVF-tools / SVF

Static Value-Flow Analysis Framework for Source Code
http://svf-tools.github.io/SVF/
Other
1.41k stars 436 forks source link

MTA Does Not Terminate on Large Project #1554

Open dylanjwolff opened 1 month ago

dylanjwolff commented 1 month ago

Hi @yuleisui,

Thanks for your help on #1548!

I'm now trying to run MTA on that same bitcode file (a SQLite stress test), but it is taking more than 24 hours (!). It is a relatively large program, so I'm not sure if this is expected behavior or maybe a performance bug. I have tried on some other larger programs though (>10k LOC), and many of them seem to take more than 24 hours to terminate.

Here is the command I am using:

mta -dump-icfg -race -stat=false labeled.bc

If it is just because of fundamental scalability issues, is there any way to easily relax the analysis to trade off precision on these larger projects such that the analysis can terminate in a reasonable amount of time? For my use case, it is OK if there are a larger number of false positives, but ideally the analysis would finish within a few minutes.

I've tried commenting out most of the functionality in LockAnalysis.cpp, but even that alone doesn't seem to be enough to allow it to terminate quickly.

I'd appreciate any guidance you might have on this. Also happy to help debug if this looks like a bug to you.

Thanks!

labeled.bc.tar.gz

OS: Ubuntu 22.04.5 LTS x86_64 Kernel: 6.8.0-45-generic CPU: Intel i7-10700 (16) @ 4.800GHz Memory: 15700MiB

LimbicSys commented 4 days ago

I have tried MTA on some real world projects. It is slow on large project. It may take 2~3 days (or more) on a 10k LOC project.

LimbicSys commented 4 days ago

I have tried MTA on some real world projects. It is slow on large project. It may take 2~3 days (or more) on a 10k LOC project.

I guess that the pointer analysis take too much time. Getting an only ICFG of a large project is quite fast.

dylanjwolff commented 2 days ago

I did some light profiling (please take with a grain of salt), and there appear to be multiple bottlenecks.

The first is that just constructing the TCT takes a very long time for large programs.

image

This seems to mostly be due to the context sensitivity.

If we remove context-sensitivity, this phase of the analysis wil terminate within a reasonable amount of time (a couple of hours). However, for computing race pairs, again there is a bottleneck. Since the race pair computation is pairwise, there are M*N checks for may-happen-in-parallel etc., which is too much for programs where M and N are on the order of 600k loads+stores.

As a note, I am on the latest release version SVF-3.0 -- I realized that I missed adding that to the original issue