SunLab-GMU / GraphSPD

The official repository of "GraphSPD: Graph-Based Security Patch Detection with Enriched Code Semantics". The paper will appear in the IEEE Symposium on Security and Privacy (S&P), San Francisco, CA, May 22-26, 2023.
https://github.com/SunLab-GMU/GraphSPD
Apache License 2.0
41 stars 7 forks source link

gen_cpg is slow #5

Closed Qiuzg closed 1 year ago

Qiuzg commented 1 year ago

I want to evaluate this model using my own dataset, but when I run 'gen_cpg.py' is very slow, I can only get 6-10 CPG results per minute. So is there any solution to accelerate this process

I tried multi-thread in gen_cpg.py, but there it occurred some error in ./joern/workspace

shuwang127 commented 1 year ago

gen_cpg.py uses the third-party tool joern to generate cpgs for both versions of the source code. The processing speed is currently the bottleneck of the GraphSPD tool.

os.system('cd ./joern; ./joern --script ../locateFunc.sc --params inputFile=.'+path+d+'/a/,outFile=.'+path+d+'/cpg_a.txt')
os.system('cd ./joern; ./joern --script ../locateFunc.sc --params inputFile=.'+path+d+'/b/,outFile=.'+path+d+'/cpg_b.txt')

Please refer to https://docs.joern.io for more information.

Currently, gen_cpg.py does not support multi-thread. However, the gen_cpg.py actually calls the bash commands, you can seek the multi-thread possibility via building your own pre-processing code.