eurecom-s3 / symcc

SymCC: efficient compiler-based symbolic execution
http://www.s3.eurecom.fr/tools/symbolic_execution/symcc.html
GNU General Public License v3.0
773 stars 135 forks source link

Backend Rust/Foreign Language Support #69

Open julihoh opened 3 years ago

julihoh commented 3 years ago

Hey,

I came up with the code in this PR in order to be able to implement new SymCC/SymQEMU backends in Rust for my GSoC project (which is about integrating LibAFL with SymCC as a concolic engine for fuzzing).

In a nutshell, it implements a new backend in SymCC which is supposed to wrap a backend implemented in another language. This wrapper implements some utility functionality and garbage collection. How this is achieved is described in the README in runtime/rust_backend/README. The wrapped runtime in turn can implement a reduced interface of the 'original' runtime and can also keep the libc wrappers.

I have put quite a bit of effort into making this code strictly additive to the current codebase of SymCC, which should make it easy to maintain. Also note, that while it may look like a lot of code, a lot of it is quite boilerplate-y and more or less a copy of the SimpleRuntime. I propose to merge this into SymCC purely for organisational reasons (ie. not having to sync the fork with SymCC upstream).

I think the code could benefit SymCC/SymQEMU and related research by making it much easier to build a Runtime that is not C++, but still keeps the libc wrappers and garbage collection feature from the common SymCC runtime code. Note, that interfacing with the GC feature of SymCC more or less requires interoperability with C++, making an implementation in foreign languages quite cumbersome. The common backend code also makes the assumption that the backend will always keep track of the bit-width of any expression, which proposed code also implements transparently for the wrapped runtime.

Cheers

aurelf commented 3 years ago

Hi, thanks for the PR. I'm going to look at this. Looks interesting. To start, 2 comments:

Thanks.

julihoh commented 3 years ago

Thanks for taking a look :)

  • Can you add a target to the Dockerfile to compile and run the tests on this backend? You can take inspiration from the simple backend again here.

This will be difficult, as the 'rust backend' code added in this repository delegates the solving, etc. to the backend that is implemented in a foreign language (and for which no 'canonical' version exists at this point). I presume most tests are irrelevant in this case. Note, that the 'rust backend' in itself is also not usable on it own, because it compiles to a static archive instead of shared library, so it is not directly compatible with the interface that SymCC (and SymQEMU) expect. The symcc_runtime crate in the LibAFL repo takes care of giving the user an actual interface for implementing the runtime and enables the production of a shared library that can be used as a SymCC backend. (This is achieved by linking against the static archive that is generated by building the rust backend in this repo, re-exporting the standard the SymCC interface from it and implementing the foreign symbols). Finally, there is a smoke test for this whole process in the LibAFL repo.

  • Can you mention in the readme how this is intended to be used with LibAFL ?

I'm writing up some more extensive documentation for LibAFL users in the coming days (this is the basically the last missing piece for this whole project :) )

  • SymCC would be an executor ?

Exactly. SymQEMU is supported as well, because they use the same interface. Garbage Collection, which, AFAIU is required for the massive amounts of expressions that the SymQEMU fronted produces, is implemented as part of this PR.

  • Is there a minimal example that actually uses this Rust backend ?

Yes. Here is a the runtime used in the aforementioned smoke test, which traces the calls made to the backend for processing outside of the traced target, and here is a backend that is used as part of an example hybrid fuzzer based on LibAFL.

Note that, the runtimes that I linked to make use of the pre-built components that come with the symcc_runtime crate. This really only makes sense when reading the documentation for that crate. Unfortunately, that package is not published to crates.io yet, and therefore no publicly accessible HTML version of the documentation for this crate exists yet. You can of course read the documentation straight out of the source, but this is of course not ideal. Alternatively, you could clone LibAFL and build the documentation using cargo doc -p symcc_runtime --openor just wait until symcc_runtime is published and read the docs at docs.rs.

Cheers

julihoh commented 2 years ago

Just a quick note: I haven't forgotten about this, just very busy at this time. I'll come back to this as soon as possible!

julihoh commented 2 years ago

Just out of curiosity: Why did you decide to do target -> SymCC runtime -> Rust code rather than target -> Rust code -> parts of SymCC runtime used as helpers? I'm not suggesting that one is better than the other, just curious :)

That's a quite curious design decision indeed. It follows from the following requirements:

  1. I (or we, as in "in collaboration with @domenuk and @andreafioraldi") wanted a solution that could be built entirely from cargo. To be specific, I didn't want to have to do a CMake build to build my otherwise rust-based backend.
  2. Rust/cargo can't re-export symbols from external libraries.

It is possible to link the rust parts of the backend into the final shared object using (Corrosion) [https://github.com/AndrewGaspar/corrosion]. This would allow the scenario as you describe. Note that, if we want to interface with SymCC's GC, we need some C++ (because it deals in std types). However, this violates requirement 1. (I know that this all works, because this is how I had implemented it initially).

If we now want to invert the dependency between CMake and Cargo (ie. call cmake from cargo), then we can't have cargo call into cmake to do the entire build. Instead, we need to build a static library out of the code that we want from the SymCC runtime and link that into the final shared object, which is now built and linked by cargo. This is where requirement 2. comes in. Since we can't simply re-export the symbols from the SymCC/C++ part of the backend, we need to define the required symbols in our rust crate. Therefore, in the symcc_runtime crate, we 1. gather exported symbols from the C++ runtime using bindgen, 2. generate a header that renames those symbols according to a known pattern and finally compiles the C++ runtime with this renaming header included, therefore producing a static library with the required code, but renamed symbols. Then, in rust, we generate the necessary symbol definitions and forward calls to the C++ runtime with renamed symbols. The renaming is necessary, because, again, we can't re-export external symbols. This technique basically works around requirement 2. Then, to be able to interface with SymCC's GC (and to implement _sym_bits_helper for convenience), we have the C++ part of this PR, which does exactly those two things: interfaces with the GC and implements _sym_bits_helper. The GC interface to the rust runtime is then FFI compatible (ie. does not use C++'s std::set).

I'm kind of annoyed that I can't seem to find a more concise explanation, but the issues are kind of intertwined in a weird way. Hope it is understandable. Also, another goal was to modify as little existing code as possible from SymCC as to be able to easily maintain a fork in case these changes will not be merged.

In principle, I have no objections against a backend that delegates to foreign-language implementations. But I agree with @aurelf that it would be nice to have a way to make sure we don't break implementations, other than receiving bug reports from after the fact :P Do you think it's feasible to express the simple backend in terms of your generic backend as I suggested in the comments? That would give us a nice way to make sure we don't break anything...

The way that the C++ part of the backend (specifically the implementation of _sym_bits_helper) is implemented at the moment means that 32-bit targets can't be supported. (In short: the bit-width of all expressions is stored in the least significant byte of the "SymExpr" (ie. pointer-sized) type. On 32-bit, I believe this leaves too few bits for the actual address.). Other than that, I see no technical reason why the simple backend couldn't be implemented this way. However, I personally think that is would make the simple backend kind of convoluted. A solution could be potentially to leave the simple backend as it is and re-implement the simple backend (ie. create a second simple backend) in terms of the rust interface.

To come back to the issue of testability: In general, the code in this PR doesn't implement a backend, as discussed before. What it does implement, however, is this whole forwarding business. Therefore my suggestion would be to test the forwarding only decoupled from the actual backend logic. For example, a 'tracing' backend could be implemented that simply outputs the calls that were made to it in text format to stdout. A test script could then ensure the correct sequence of calls were made to the backend. In fact, this is basically what the tests inside the LibAFL repo do at this point: https://github.com/AFLplusplus/LibAFL/tree/main/libafl_concolic/test .

Of course, in my humble opinion, the current C++ parts of the backend (not the LLVM pass of course) should simply be implemented in Rust, providing a FFI compatible interface like the one in the symcc_runtime crate. But that's for another time ;)