Closed ercoppa closed 1 year ago
I agree that (1) looks like a cleaner solution. I checked the LLVM Language Reference, and it explicitly mentions the widening as part of the instruction's semantics; I think that's another point in favor of (1). Strictly speaking, we'd have to be more general and handle any type whose bit size isn't an integral multiple of 8, but i1
should be the only one that the relevant frontends emit :thinking:
Please check carefully PR #116 since it introduces several critical changes. Let me know what you think.
Consider this function (inspired by code from
objdump
):Clang with
-O0
may generate:Notice the store of
i1
. SymCC generates this code:LLVM IR
``` define dso_local zeroext i1 @foo(i32 %0) #0 { call void @_sym_notify_basic_block(i64 5403328) %2 = call i8* @_sym_get_parameter_expression(i8 0) %3 = alloca i1, align 1 %4 = alloca i32, align 4 %5 = alloca i32, align 4 %6 = ptrtoint i32* %4 to i64 call void @_sym_write_memory(i64 %6, i64 4, i8* %2, i8 1) store i32 %0, i32* %4, align 4 %7 = ptrtoint i32* %4 to i64 %8 = call i8* @_sym_read_memory(i64 %7, i64 4, i8 1) %9 = load i32, i32* %4, align 4 %10 = icmp eq i8* null, %8 br i1 %10, label %14, label %11 11: ; preds = %1 %12 = call i8* @_sym_build_integer(i64 0, i8 32) %13 = call i8* @_sym_build_not_equal(i8* %8, i8* %12) br label %14 14: ; preds = %1, %11 %15 = phi i8* [ null, %1 ], [ %13, %11 ] %16 = icmp ne i32 %9, 0 %17 = icmp eq i8* null, %15 br i1 %17, label %19, label %18 18: ; preds = %14 call void @_sym_push_path_constraint(i8* %15, i1 %16, i64 4548776) br label %19 19: ; preds = %14, %18 br i1 %16, label %20, label %22 20: ; preds = %19 call void @_sym_notify_basic_block(i64 6020064) %21 = ptrtoint i1* %3 to i64 call void @_sym_write_memory(i64 %21, i64 1, i8* null, i8 1) store i1 false, i1* %3, align 1 br label %52 22: ; preds = %19 call void @_sym_notify_basic_block(i64 5402272) %23 = ptrtoint i32* %4 to i64 %24 = call i8* @_sym_read_memory(i64 %23, i64 4, i8 1) %25 = load i32, i32* %4, align 4 %26 = icmp eq i8* null, %24 br i1 %26, label %30, label %27 27: ; preds = %22 %28 = call i8* @_sym_build_integer(i64 0, i8 32) %29 = call i8* @_sym_build_equal(i8* %24, i8* %28) br label %30 30: ; preds = %22, %27 %31 = phi i8* [ null, %22 ], [ %29, %27 ] %32 = icmp eq i32 %25, 0 %33 = icmp eq i8* null, %31 br i1 %33, label %37, label %34 34: ; preds = %30 %35 = call i8* @_sym_build_bool_to_bit(i8* %31) %36 = call i8* @_sym_build_zext(i8* %35, i8 31) br label %37 37: ; preds = %30, %34 %38 = phi i8* [ null, %30 ], [ %36, %34 ] %39 = zext i1 %32 to i32 %40 = ptrtoint i32* %5 to i64 call void @_sym_write_memory(i64 %40, i64 4, i8* %38, i8 1) store i32 %39, i32* %5, align 4 %41 = ptrtoint i32* %5 to i64 %42 = call i8* @_sym_read_memory(i64 %41, i64 4, i8 1) %43 = load i32, i32* %5, align 4 %44 = icmp eq i8* null, %42 br i1 %44, label %48, label %45 45: ; preds = %37 %46 = call i8* @_sym_build_integer(i64 0, i8 32) %47 = call i8* @_sym_build_not_equal(i8* %42, i8* %46) br label %48 48: ; preds = %37, %45 %49 = phi i8* [ null, %37 ], [ %47, %45 ] %50 = icmp ne i32 %43, 0 %51 = ptrtoint i1* %3 to i64 call void @_sym_write_memory(i64 %51, i64 1, i8* %49, i8 1) store i1 %50, i1* %3, align 1 br label %52 52: ; preds = %48, %20 call void @_sym_notify_basic_block(i64 5837824) %53 = ptrtoint i1* %3 to i64 %54 = call i8* @_sym_read_memory(i64 %53, i64 1, i8 1) %55 = load i1, i1* %3, align 1 call void @_sym_set_return_expression(i8* %54) ret i1 %55 } ```where the interesting part is:
In other words, it is writing a bool expression (from
_sym_build_not_equal
) into memory without using_sym_bool_to_bit
: indeed, the original LLVM IR is not doing a cast (because it is writing ani1
and not ani8
!). When running this code, we get the following assertion failure:since we are doing an extract over a bool expression (which is illegal) in
_sym_write_memory
. Notice also that the original program is returning i1 (bool) while the instrumented program is returning an integer expression of 8 bits.Now, there are a few ways of fixing this:
i1
, we have to perform conversions, e.g., adding a call to_sym_bool_to_bit
.createExtract
,negatePath
, etc.) in the backend. This solution is worse but I do not remember if I implemented it due to similar issues with SymQEMU coming from thesetcc
instruction(*).KLEE is using approach (1): they are doing the conversion in the memory model.
Let me know what do you suggest. I can make a PR for approach (1) by modifying the pass.
(*) I am unable to build right now a minimal test case for reproducing the issue in SymQEMU... I do not remember the exact reason (it should be a tricky case with
setcc
).