eurecom-s3 / symcc

SymCC: efficient compiler-based symbolic execution
http://www.s3.eurecom.fr/tools/symbolic_execution/symcc.html
GNU General Public License v3.0
776 stars 135 forks source link

Warning: unhandled LLVM intrinsic #5

Closed vanhauser-thc closed 4 years ago

vanhauser-thc commented 4 years ago

Just started playing with symcc and while compiling a target I get many warnings like this:

Warning: unhandled LLVM intrinsic llvm.floor.f64
Warning: unhandled LLVM intrinsic llvm.va_start
Warning: unhandled LLVM intrinsic llvm.va_end

is this a problem?

sebastianpoeplau commented 4 years ago

@vanhauser-thc the TL;DR is no, usually there's no need to worry about those warnings.

The slightly longer version: Whenever the compiler pass encounters a call to an intrinsic function in LLVM that it doesn't know, it emits a warning and inserts code to concretize the result. The program will still work as expected, but the function call will always return a concrete result, even if the input was symbolic. In other words, we lose symbolic information. In your case, llvm.floor.f64 rounds a floating-point number, which we could model with the simple backend but not with the QSYM backend because it doesn't support floats. llvm.va_start and llvm.va_end are helpers for functions with a variable number of arguments; there is no fundamental problem in supporting them, but so far it hasn't been necessary. If you wanted to add support for a given intrinsic function, you could add the appropriate symbolic handling to Symbolizer::handleIntrinsicCall in Symbolizer.cpp.

vanhauser-thc commented 4 years ago

@sebastianpoeplau thank you! I tried it against libpng and then run it to see how many png files it would be created and expected several 100.000 files being created over the next hours ... but it only created 356. With all the options and features in libpng this is unlikely. What could I do to get symcc to create more and go deeper?

sebastianpoeplau commented 4 years ago

How did you run it on libpng? I assume you have some test program that uses the library, and you ran a SymCC-instrumented version of the program with a test input. Then there are two limiting factors:

  1. SymCC can only exercise parts of the library that your test program can reach. For example, in our OpenJPEG experiments we ran opj_decompress, so decompression functionality was tested while compression was not.
  2. Assuming that your test driver can reach enough of the library, keep in mind that running SymCC once will only generate one new input per branch point. In other words, SymCC executes the program on your test input, and at each branch point it stops and generates a new input that would take the alternative branch. For thorough exploration, you'll want to run SymCC repeatedly. The easiest approach is to just feed the newly generated test cases back into the program in some sort of Bash loop. A more sophisticated way is to run with a fuzzer; the fuzzer takes care of selecting the most promising test cases and checking for vulnerabilities while SymCC performs the more expensive task of generating test cases that reach certain parts of the code. We have an example of this fuzzing setup in the docs and instructions for replicating our experiments following that approach on the website.
vanhauser-thc commented 4 years ago

I was using the libfuzzer harness -> https://github.com/google/fuzzbench/blob/master/benchmarks/libpng-1.2.56/target.cc and added a main that reads the data from stdin and sets the size parameter accordingly. compiled the library + the harness with symcc, and let it run with an input file.

as you can see in the hardness there are lots of functionalities triggered, resizing, changing the color, etc.

          png_set_gray_to_rgb(png_handler.png_ptr);
          png_set_expand(png_handler.png_ptr);
          png_set_packing(png_handler.png_ptr);
          png_set_scale_16(png_handler.png_ptr);
          png_set_tRNS_to_alpha(png_handler.png_ptr);
          passes = png_set_interlace_handling(png_handler.png_ptr);
          png_read_update_info(png_handler.png_ptr, png_handler.info_ptr);
sebastianpoeplau commented 4 years ago

In that case you probably just need to run it repeatedly. The docs that I linked above show how to use a little helper program included with SymCC to run with AFL - that's the method we found the most successful.