eurecom-s3 / symcc

SymCC: efficient compiler-based symbolic execution
http://www.s3.eurecom.fr/tools/symbolic_execution/symcc.html
GNU General Public License v3.0
776 stars 135 forks source link

Incorrect constraints in tcpdump #15

Closed borzacchiello closed 4 years ago

borzacchiello commented 4 years ago

Hi, I'm using symcc on tcpdump (version 4.9.3, with libpcap 1.9.1). I compiled it using the instructions in the docs.

When executing the binary, after some time, I'm getting the following warning (many times): [INFO] syncConstraints: Incorrect constraints are inserted

I'm running the binary in this way:

SYMCC_INPUT_FILE=./seed/small_capture.pcap ./tcpdump_symcc -e -vv -nr ./seed/small_capture.pcap

The seed is taken from afl.

I debugged it a little bit, and it seems that just before the first warning pops out, symcc adds the following conflicting constraints to the path constraint:

Equal(0x1A567B2, 0x1A5679E + (ZExt(bits=64, And(0x3C, Shl(Read(index=36), 0x2)))))
Equal(0x1A567B4, 0x1A567A2 + (ZExt(bits=64, And(0x3C, Shl(Read(index=36), 0x2)))))

Using gdb, it seems that the constraints were added in tcp_print function. Probably they refer to the branches in the ND_TTEST_2 macro.

Any idea of what is going on? Am I doing something wrong?

Thanks, Luca

sebastianpoeplau commented 4 years ago

Hi Luca,

most likely there's a problem in SymCC's handling of some instruction, resulting in incorrect symbolic expressions (and hence conflicting constraints). I've had similar issues in the past when bits were accidentally truncated from integers. I'll look into it.

Thanks for reporting the issue! Seb

sebastianpoeplau commented 4 years ago

For the record, some initial results of my attempts to debug the issue:

We fail while adding a new constraint in tcp_print, as you suggested.

(gdb) bt
[...]
#2  0x00007ffff7b01614 in qsym::Solver::syncConstraints (this=0x5555560b2e90, e=std::shared_ptr<class qsym::Expr> (use count 4, weak count 2) = {...})
    at /home/seba/work/compiler/pass/runtime/qsym_backend/qsym/qsym/pintool/solver.cpp:418
#3  0x00007ffff7b02955 in qsym::Solver::negatePath (this=0x5555560b2e90, e=std::shared_ptr<class qsym::Expr> (use count 4, weak count 2) = {...}, taken=false)
    at /home/seba/work/compiler/pass/runtime/qsym_backend/qsym/qsym/pintool/solver.cpp:522
#4  0x00007ffff7aff5ed in qsym::Solver::addJcc (this=0x5555560b2e90, e=std::shared_ptr<class qsym::Expr> (use count 4, weak count 2) = {...}, taken=false, 
    pc=94533503337048) at /home/seba/work/compiler/pass/runtime/qsym_backend/qsym/qsym/pintool/solver.cpp:186
#5  0x00007ffff7b1fbb6 in _sym_push_path_constraint (constraint=0x555556406fd0, taken=0, site_id=94533503337048)
    at /home/seba/work/compiler/pass/runtime/qsym_backend/Runtime.cpp:281
#6  0x0000555555aff5fb in tcp_print (ndo=0x7fffffffccf0, bp=<optimized out>, length=40, bp2=0x5555560a9cae "E\020", fragmented=0) at ./print-tcp.c:193
[...]

The constraint looks reasonable at first glance:

(gdb) p _sym_expr_to_string(constraint)
$1 = 0x7ffff7c29aa0 <_sym_expr_to_string::buffer> "Equal(0x60, And(0xF0, Read(index=36)))"

However, while adding it, the QSYM backend adds constraints for the possible ranges of dependent expressions - that's where we fail:

(gdb) p _sym_expr_to_string(node.get())
$4 = 0x7ffff7c29aa0 <_sym_expr_to_string::buffer> "ZExt(bits=64, And(0x3C, Shl(Read(index=36), 0x2)))"
[...]
(gdb) p *node->range_sets[0]
$8 = {BV_ = {is_unsigned_ = false, bit_width_ = 64}, ranges_ = std::set with 0 elements}

I guess there are two questions:

  1. Is the ZExt expression a valid dependency? I think so, because it's just a restatement of the And expression from the original constraint (0x3C << 2 == 0xF0).
  2. Why is the range empty? If I understand correctly, it's either because Read(index=36) is overly constrained, or because QSYM has a bug in translating constraints to ranges.

Let's see if I can find out...

borzacchiello commented 4 years ago

AFAIK, a range in QSYM is null if the range analysis fails, i.e., if two conflicting constraints reach addRangeConstraint.

While debugging, I noticed that the two constraints that I mentioned reach the range analysis, causing the expression ZExt(bits=64, And(0x3C, Shl(Read(index=36), 0x2))))) to have an empty range.

sebastianpoeplau commented 4 years ago

Those two seem to come from the function EXTRACT_16BITS, which is called in tcp_print. The function uses the library function ntohs - I suspect that our problem is there. We don't have symbolic handling for it, and probably something goes wrong in the implicit concretization.

sebastianpoeplau commented 4 years ago

I think I found it.

When tcp_print prints out the source and destination ports, the bitcode represents the TCP header as a byte array and computes the address of element 2:

; %1 is the TCP header
%91 = getelementptr inbounds i8, i8* %1, i64 2, !dbg !517

Our symbolic equivalent of the same computation, however, adds 2 * 2 to the base address, using a wrong type size of 2 instead of 1:

; %7 is the expression for %1
  %86 = call i8* @_sym_build_integer(i64 2, i8 64) #7, !dbg !517
  %87 = call i8* @_sym_build_mul(i8* %86, i8* %86) #7, !dbg !517
  %88 = call i8* @_sym_build_add(i8* %87, i8* nonnull %7) #7, !dbg !517

In the compiler pass, we obtain the size of the type as follows:

unsigned elementSize =
  dataLayout.getTypeAllocSize(type_it.getIndexedType());

I believe the problem is that this function returns 2 when called on type i8, probably for alignment reasons. Let's see if there's another API function that returns the correct size...

sebastianpoeplau commented 4 years ago

It wasn't the LLVM API in the end but an error in how we used its return value. I wrote a test to reproduce the problem and fixed it by handling the return value properly (see commit message).