eurecom-s3 / symcc

SymCC: efficient compiler-based symbolic execution
http://www.s3.eurecom.fr/tools/symbolic_execution/symcc.html
GNU General Public License v3.0
781 stars 137 forks source link

Can't create expressions for concrete non-undef structs #134

Closed sebastianpoeplau closed 1 year ago

sebastianpoeplau commented 1 year ago

This is a bug related to SymCC's handling of struct values in bitcode. The code that triggers it is the following:

%foo = type { i8, i32 }
...
%6 = insertvalue %foo { i8 1, i32 undef }, i32 %bar, 1

%bar is a symbolic value, so we need to create an expression for the constant {i8 1, i32 undef}. The compiler pass generates the following code:

%7 = alloca %foo, align 8
store %foo { i8 1, i32 undef }, ptr %7, align 4
%8 = ptrtoint ptr %7 to i64
%9 = call ptr @_sym_read_memory(i64 %8, i64 8, i1 false)
%10 = call ptr @_sym_build_insert(ptr %9, ptr %3, i64 4, i1 true)

That is, we write the struct to newly allocated memory and read it symbolically to get the expression. Unfortunately, the struct constant is fully concrete, so _sym_read_memory returns nullptr, which we promptly feed into _sym_build_insert, causing a segfault.

sebastianpoeplau commented 1 year ago

The part of the compiler pass that handles expressions for structs is quite involved, as evidenced by the long comment in the function linked in the original issue description. I think it makes sense to try and improve it in general, rather than just adding one more special case. Let's collect the requirements...

  1. Structs in SSA registers are shadowed with expressions describing their in-memory representation. We could alternatively represent a packed or tree view of the structs, but I think doing so would mainly shift the burden of expression conversion from insertvalue/extractvalue to load/store. Moreover, we don't have a way to describe expression trees with a single pointer at the moment.
  2. We need to be able to create expressions for (partially) concrete struct values:
    1. undef
    2. Structs with at least some values in them, plus any number of undef members. Here we need to handle the case where all values are concrete, as well as the case where at least one has a symbolic expression attached to it.

The current implementation handles undef (by creating an all-zeros expression); it also tries to handle structs with values in them but fails, which is the reason for the present bug. This brings us to the last requirement: there should be tests that cover the functionality :upside_down_face:

sebastianpoeplau commented 1 year ago

It is tempting to describe structs with trees of expressions... For example, the struct type { i8, i32 } could have a shadow type { ptr, ptr } containing expression pointers for individual elements.

The advantage would be that insertvalue/extractvalue wouldn't have to convert expression kinds anymore (e.g., to and from floating-point kind); this would be left to load/store, which already do it anyway. We'd need rather complex logic to handle struct values in load/store which couldn't be in the runtime: it depends on the bitcode type of the value being loaded/stored. However, the same is true for initial expression creation if we express in-memory representations. We could emit conversion functions to/from tree representation when we first encounter the struct type, but I wonder how this would work with literal struct types (i.e., struct types that are defined inline rather than named at top level) :thinking: