Use LLVM SanitizerCoverage?

eurecom-s3 / symcc

SymCC: efficient compiler-based symbolic execution

http://www.s3.eurecom.fr/tools/symbolic_execution/symcc.html

GNU General Public License v3.0

773 stars 135 forks source link

Use LLVM SanitizerCoverage? #37

Open sebastianpoeplau opened 3 years ago

sebastianpoeplau commented 3 years ago

SymCC inserts calls at the beginning of each basic block, as well as for calls and returns, to support the QSYM backend's context-aware pruning. Recent versions of LLVM have support for coverage tracing (see https://clang.llvm.org/docs/SanitizerCoverage.html), so we could consider using their mechanism instead of the custom callbacks. However, I'm not totally certain that we would get an overall benefit, and we need to support relatively old versions of LLVM.

I'll leave this issue around as a reminder, to be re-evaluated later.

vanhauser-thc commented 3 years ago

"recent" is since 4.0.1 which is totally outdated :) - or do you mean a specific feature? note that llvm 7.0 onwards does not instrument every block by default but only the necessary edges on a unique path, however this can be changed by options.

sebastianpoeplau commented 3 years ago

Maybe basic tracing is enough, but we need a way to identify call and return sites as well. Not sure if this requires PC tables, which are still marked experimental as of LLVM 11.

In the end, we need to add our own instrumentation to every call anyway (in order to communicate symbolic expressions for arguments and to store the expression of the result), so I'm mainly wondering whether we would gain anything by using the LLVM mechanism. What do you think?

vanhauser-thc commented 3 years ago

if you want to identify and instrument calls then you need to write an LLVM plugin.

See e.g. https://github.com/AFLplusplus/AFLplusplus/blob/dev/instrumentation/cmplog-routines-pass.cc.

here we parse all calls in a target, extract the parameters and if they are strings then we use these parameters to a call to our own function we insert.