eurecom-s3 / symcc

SymCC: efficient compiler-based symbolic execution
http://www.s3.eurecom.fr/tools/symbolic_execution/symcc.html
GNU General Public License v3.0
773 stars 135 forks source link

symcc_fuzzing_helper indicates process either timing out or out of memory #26

Closed straightblast closed 3 years ago

straightblast commented 3 years ago

Hi,

So i used the new Dockerfile and can now get the sample.cc to work. I am trying to use symcc in combination with AFL. However, i am getting the following which does not appear to be 'good':

2020-10-02T03:01:49Z INFO symcc_fuzzing_helper] Running on input /opt/outputs/fuzzer02/queue/id:000384,sync:fuzzer01,src:000081,+cov [2020-10-02T03:03:20Z INFO symcc_fuzzing_helper] Generated 95 test cases (0 new) [2020-10-02T03:03:20Z INFO symcc_fuzzing_helper] The target process was killed (probably timeout or out of memory); archiving to /opt/outputs/symcc/hangs

I am getting repeated 'target process was killed ..' message. Is this something with symcc? the target + harness appears to be in good shape, so I am not sure what could be the cause of the issue?

sebastianpoeplau commented 3 years ago

In principle, timeouts are expected and nothing to worry about, especially if you run a program that involves computations which are hard for the solver to reason about (e.g., involving crypto or hashes). We currently have a fixed timeout of 90 seconds per SymCC execution in the fuzzing helper; ideas for more sophisticated strategies are very welcome. (The timestamps in the log snippet you shared suggest that this particular execution did indeed hit the 90-second timeout.)

An easy way to check that SymCC is working well, i.e., producing test cases and sharing them with the fuzzer, is to check AFL's queue (e.g., /opt/outputs/fuzzer02/queue): there should be files whose names contain sync:symcc. Moreover, if you'd like to investigate the problematic inputs, they should all be in /opt/outputs/symcc/hangs. You can manually run the SymCC-compiled target program on them and check execution time and memory consumption.

straightblast commented 3 years ago

Thanks for the info. Is there any way to determine where the bottleneck is at to trouble shoot what the solver is trying to solve that leads to the timeout? Also, maybe as a potential enhancement, would it be possible to supply a whitelist / denylist on which .c/.cpp file to insert the symbolic execution into, to reduce the scope on what we want the solver to solve? I am just imagining the solver is trying to solve all symbolic expression, which may not be a particular code path I am interested to explore.

sebastianpoeplau commented 3 years ago

One way of debugging where the solver gets stuck would be to look at the stack in a debugger while the solver is busy - the last few frames will be the solver and SymCC, but before those you should see the path in the program that we're currently exploring. Also, you could print the result of _sym_expr_to_string(constraint) in _sym_push_path_constraint to get an idea of the expressions that we push to the solver. If any expression looks too complicated, the stack trace at that moment can tell you where it came from.

As for enabling SymCC only on parts of the program, one thing you can do is to compile the portions of the target that you don't want to analyze with regular Clang while using SymCC for the interesting parts. (If the code is written in C++, make sure to have Clang use SymCC's instrumented libc++.) This only works at file granularity, but it may be a start. However, SymCC won't be able to trace anything that happens in Clang-compiled code, so if that code modifies symbolic data then symbolic expressions are likely to get out of sync with the data. Another approach would be to give SymCC an AFL coverage map that marks those transitions as covered that you're not interested in (see SYMCC_AFL_COVERAGE_MAP and its use in the fuzzing helper); however, there is currently tooling to create such a map.

straightblast commented 3 years ago

I am trying to determine what is causing the hang issue. I reviewed the 'hang' folder samples, and I did something along the line of: export SYMCC_INPUT_FILE="./hang_file01" gdb ./symcc-instrumented-target b _sym_expr_to_string b _sym_push_path_constraint r ./hang_file01

the breakpoint to the functions were never hit. Exampe output for GDB:

Starting program: /opt/instrumented-symcc-fuzzer.symcc abc [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". This is SymCC running with the QSYM backend Making data read from /opt/outputs/symcc/hangs/id:000024,src:000174 as symbolic [Inferior 1 (process 37587) exited with code 01]

Any suggestion on why I am not seeing those breakpoint hits on those 'sym' functions?

sebastianpoeplau commented 3 years ago

The SymCC output suggests that it didn't pick up your SYMCC_INPUT_FILE: it seems to expect symbolic data in /opt/outputs/symcc/hangs/id:000024,src:000174. Is hang_file01 a symlink? You could also try to set SYMCC_INPUT_FILE explicitly for GDB, e.g., env SYMCC_INPUT_FILE=hang_file01 gdb ./symcc-instrumented-target. (SymCC just checks whether the contents of the environment variable are a substring of whatever is passed to open, so you can drop the ./ prefix.)

sebastianpoeplau commented 3 years ago

I suppose that this is solved; feel free to reopen otherwise.