eurecom-s3 / symcc

SymCC: efficient compiler-based symbolic execution
http://www.s3.eurecom.fr/tools/symbolic_execution/symcc.html
GNU General Public License v3.0
773 stars 135 forks source link

Potential bug in the Symbolise pass -- "Calling a function with a bad signature'' #34

Open kg8280 opened 3 years ago

kg8280 commented 3 years ago

Hi,

I am trying to support SymCC for Ada and I hit a potential bug while building our runtime library:

opt: /home/georgiou/work/projects/Symbolic_execution/llvm_source_10.0.0/llvm/lib/IR/Instructions.cpp:398: void llvm::CallInst::init(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*> >, const llvm::Twine&): Assertion `(i >= FTy->getNumParams() || FTy->getParamType(i) == Args[i]->getType()) && "Calling a function with a bad signature!"' failed.
Stack dump:
0.  Program arguments: /home/georgiou/work/projects/Symbolic_execution/build/bin/opt -load /home/georgiou/work/projects/Symbolic_execution/SymCC/symcc_build_simple/libSymbolize.so -symbolize g-alleve.bc -o symbolize_bc/g-alleve.bc 
1.  Running pass 'Function Pass Manager' on module './g-alleve.bc'.
2.  Running pass 'Symbolization Pass' on function '@gnat__altivec__low_level_vectors__ll_vsc_operations__abs_vxiXnn'
 #0 0x000055c8e040df3e llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/georgiou/work/projects/Symbolic_execution/llvm_source_10.0.0/llvm/lib/Support/Unix/Signals.inc:568:3
....

I am attaching a zip with the LLVM bitcode file causing the issue.

Thank you, Kyriakos

g-alleve.zip

sebastianpoeplau commented 3 years ago

I'll have a look, thanks.

sebastianpoeplau commented 3 years ago

I think it's a problem with vectors:

$ opt -load ./libSymbolize.so -symbolize ~/Downloads/g-alleve.bc -o /dev/null
[...]
Call parameter type does not match function signature!
  %21 = icmp slt <16 x i8> %wide.load, zeroinitializer, !dbg !1369
 i1  call void @_sym_push_path_constraint(i8* %20, <16 x i1> %21, i64 94669052009640), !dbg !1369
Call parameter type does not match function signature!
  %14 = icmp eq <16 x i8> %wide.load, <i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128>, !dbg !1369
 i1  call void @_sym_push_path_constraint(i8* %13, <16 x i1> %14, i64 94669052009784), !dbg !1369
in function gnat__altivec__low_level_vectors__ll_vsc_operations__abs_vxiXnn
LLVM ERROR: Broken function found, compilation aborted!

LLVM uses vectors to represent SIMD operations, and it defines most of the bitcode instructions on vectors as well. Unfortunately, we don't currently support them in SymCC; the pass is deliberately located in the middle of the optimization pipeline so that we run before the auto-vectorizer (which converts certain loops to vector operations). I suspect that in your case the frontend already emits vectors, and our inability to handle them properly leads to the problem you're seeing when we try to pass one of the literals to the symbolic backend.

Do you have a way to make your code use a less optimized version of the same function for the time being? I've occasionally seen libraries that use SIMD operations but provide a fallback implementation for processors that lack the required instruction support. Alternatively, one could look into handling vector instructions in SymCC (see also #10). It's probably rather straightforward, but it requires revisiting all the code in Symbolizer.cpp to handle the case where the input or output of any instruction is a vector.

kg8280 commented 3 years ago

Hi Sebastian and thank you for your time. I compiled with -O0 and indeed the bug is gone for now. Although, I run it to some more unknown and unhandled instructions with the most common being the:

unhandled LLVM intrinsic llvm.eh.typeid.for
Warning: unknown instruction   %50 = fneg float 

and less common:

Warning: unhandled LLVM intrinsic llvm.umul.with.overflow
Warning: unhandled LLVM intrinsic llvm.ssub.with.overflow
Warning: unhandled LLVM intrinsic llvm.sadd.with.overflow
Warning: unhandled LLVM intrinsic llvm.smul.with.overflow
Warning: unknown instruction   %7 = cmpxchg volatile 
Warning: unknown instruction   %2 = atomicrmw volatile sub
Warning: unknown instruction   %1 = atomicrmw volatile add

are any of these handle concretely or it means that the SE will completely stop when hitting them?

I am completely new to SE but I would like to increase my understanding and start contributing to the enchantment of SymCC if it is possible.

Thank you, Kyriakos

sebastianpoeplau commented 3 years ago

In general, the results of unhandled instructions and LLVM intrinsics are concretized, the same way that we would handle the results of calls to uninstrumented functions; I've pushed a small change to make this clear in the log messages. There is one caveat though: in all of those cases (i.e., uninstrumented functions, unhandled instructions, and unknown intrinsics), if the code changes memory then SymCC has no way of knowing about it, so the expressions that it stores for the affected memory region may no longer represent the data correctly. As a consequence, the quality of our symbolic reasoning may degrade, but we will always keep executing.

Your contributions would be highly welcome! I'll be happy to give you an overview of the internals, either here or via email.