eurecom-s3 / symcc

SymCC: efficient compiler-based symbolic execution
http://www.s3.eurecom.fr/tools/symbolic_execution/symcc.html
GNU General Public License v3.0
773 stars 135 forks source link

SYMCC_INPUT_FILE comparison uses strstr #53

Open grant-h opened 3 years ago

grant-h commented 3 years ago

When running a command like SYMCC_INPUT_FILE=inp x86_64-linux-user/symqemu-x86_64 /bin/cat -t inp2

I wouldn't expect the inp2 file to be marked as symbolic, but the environment variable comparison is using strstr instead of my expectation of strcmp.

https://github.com/eurecom-s3/symcc/blob/4fddd3cfcdbd6813ee23da51af58a31d73794ea6/runtime/LibcWrappers.cpp#L114-L122

sebastianpoeplau commented 3 years ago

Yes, the idea was that this allows you to specify just SYMCC_INPUT_FILE=foo and have it match /some/long/path/to/foo as well. However, I agree that it's not ideal. Feel free to suggest a better implementation! I was thinking something that allows you to specify either the full (absolute or relative?) path or the base name, but not any arbitrary substring, but I never got around to building it.

grant-h commented 3 years ago

Was this implementation mostly geared towards development, allowing for quick hacking be easier, or to allow multiple similarly named files to be marked as symbolic (or both)? I do see that checking the path would be difficult given that the target could change directories or even have strange paths like ".////path////to//file/" or "../path/../path/to/file".

Here's are my ideas regarding marking files as symbolic:

sebastianpoeplau commented 3 years ago

Was this implementation mostly geared towards development, allowing for quick hacking be easier, or to allow multiple similarly named files to be marked as symbolic (or both)? I do see that checking the path would be difficult given that the target could change directories or even have strange paths like ".////path////to//file/" or "../path/../path/to/file".

Yes, the current solution was mostly an easy-to-implement approach that worked for the early use cases. The wildcard scenario is not one that I thought of, but I see how it could be useful. But, like you described in the initial error report, the way things work now is prone to unexpected behavior (try SYMCC_INPUT_FILE=/!) and could certainly be improved.

* Idea 1: The filename passed into `SYMCC_INPUT_FILE` could be resolved to an absolute path at init time. Conversely, a stat call could be added to the fopen/open paths that also gets the absolute path for file names passed in. Effectively, we want to canonicalize both paths at compare time to allow for stricter `strcmp` while catching some common mismatches (`./file` vs `file`). The issue with this is that stat can fail and the file referenced by `SYMCC_INPUT_FILE` may not be available at init time (such as the program under test creating it, itself).

* Idea 2: Add a runtime supported function `extern "C" symcc_make_symbolic(FILE * fp)` and `symcc_make_symbolic(istream & stream)` and allow the programmer to add their own logic to mark files as symbolic. This is how Klee does it. This would hurt the SymQEMU case where the target is running as a binary, but then we could fallback to using the environment variable or an additional one such as `SYMCC_INPUT_FILELIST`.

I think both could be useful, and they could coexist with the current approach. We'll be happy to accept pull requests...