eurecom-s3 / symcc

SymCC: efficient compiler-based symbolic execution
http://www.s3.eurecom.fr/tools/symbolic_execution/symcc.html
GNU General Public License v3.0
773 stars 135 forks source link

Running SymCC & Interpreting Results, Prerequisite on main Function #42

Closed rindPHI closed 3 years ago

rindPHI commented 3 years ago

Dear SymCC devs,

now that I managed to compile SymCC inside my Ubuntu VM, I am trying to run it on the initial sample you provide in your readme:

#include <stdio.h>
#include <stdint.h>
#include <unistd.h>

int foo(int a, int b) {
    if (2 * a < b)
        return a;
    else if (a % b)
        return b;
    else
        return a + b;
}

int main(int argc, char* argv[]) {
    int x;
    if (read(STDIN_FILENO, &x, sizeof(x)) != sizeof(x)) {
        printf("Failed to read x\n");
        return -1;
    }
    printf("%d\n", foo(x, 7));
    return 0;
}

The output I get is:

echo 'aaaa' | ./test
This is SymCC running with the QSYM backend
Reading program input until EOF (use Ctrl+D in a terminal)...
a: 1633771873
[STAT] SMT: { "solving_time": 0, "total_time": 20337 }
[STAT] SMT: { "solving_time": 19354 }
[INFO] New testcase: /tmp/output/000000
1633771873

The file /tmp/output/000000, however, is empty. Does this mean that "000000" is the suggested test case? Unfortunately, the README does not mention the expected output and how to interpret it, so I'm a little lost here.

Also, I don't understand the code you're using for reading from STDIN (sorry, no C native), so I tried to replace it with scanf("%d\n", x) which gives me a nice integer I know how to process. But then, SymCC does not perform any instrumentation, or it does not generate any test cases. What are the prerequisites for the main function to use SymCC for own code?

It would be great if you could help me out!

I'm planning to play around with SymCC for own research, and also I'm going to present the paper in an upcoming reading group next Monday. It would be great if I knew until then how to use it!

Thanks & Best Regards, Dominic

rindPHI commented 3 years ago

Remark: The 16 tests in ninja check pass. So I guess there's nothing wrong with my setup...

rindPHI commented 3 years ago

Oooookay, I managed to work my way around. I manage to find the bug in the following program in 3 rounds, and in each I get two new testcases:

#include <stdio.h>
#include <unistd.h>

int diff(int a, int b) {
  if (a <= b + 1) {
    printf("  ==> Branch 1\n");
    return b - a;
  }
  else {
    printf("  ==> Branch 2\n");
    return a - b;
  }
}

int main() {
  char inp[2];

  if (read(STDIN_FILENO, &inp, sizeof(inp)) != sizeof(inp)) {
    printf("Failed to read input\n");
    return -1;
  }

  int iInp = (inp[0] - 48) * 10 + (inp[1] - 48);
  int result = diff(iInp, 17);
  printf("  ==> diff result: %d\n", result);
  if (result < 0)
    printf("\n  ==> BUG!\n");
  return 0;
}

I guess the scanf call is cannot be symbolically instrumented, right? Probably you'd need to write a symbolic summary for it, or to instrument the whole standard library ;)

One more thing, just to be sure I'm understanding it correctly: Currently, SymCC can generate inputs only for programs reading one string from standard input, right?

Thanks again & Best Regards, Dominic

julihoh commented 3 years ago

You can also set SYMCC_INPUT_FILE to the path of your input file and it will try to instrument that. (This is also documented in the README btw ;) ) Note, however, that this depends on how exactly the target interacts with the c std lib (ie. not all functions may be supported, leading to SymCC not symbolising where it should). You can see the supported functions here.

sebastianpoeplau commented 3 years ago

In addition to the resources that @julihoh referenced, there's a bit of background on the topic of libc instrumentation here. In summary, full instrumentation is possible but would make the tool significantly harder to set up. Thankfully, several people have been contributing symbolic summaries of important libc functions.

Regarding input sources, we basically treat one file descriptor as your source of symbolic data, no matter how much is read from it - a single string, a JPEG or whatever else your program wants. (Your concrete input just has to be long enough because we never change its length.) By default, we use standard input, but you can use SYMCC_INPUT_FILE to change that, like @julihoh said. The configuration options are documented in more detail here. For more complex use cases (and automatic configuration of a lot of those settings), you may want to look into hybrid fuzzing with SymCC.

rindPHI commented 3 years ago

Thanks for these very helpful hints!!!