Initial Fuzzing Infrastructure

fitzgen commented 4 years ago

I plan on laying out some foundational fuzzing infrastructure for Wasmtime in the next few weeks. I'd like to use this issue as a kind of meta issue to keep track of this work. I'd also appreciate feedback on the plan from anyone with experience fuzzing or domain knowledge of a particular thing we plan on fuzzing.

Goals

Find bugs!
- Bugs that we wouldn't otherwise find until our users hit them.
- Bugs that are hard to manually write test cases for, or that you wouldn't even think of testing for.
Make bugs (fuzzer-found or otherwise) easier to debug via automatic test case reduction.

Strategy

Breadth not Depth

At least initially, let's build out a few different fuzzing approaches enough that they start identifying bugs, but not spend a ton of time building bespoke tools tailored for exactly the problems we have at hand.

My assumptions are that

we have low-hanging fruit available, since we haven't done a ton of fuzzing for a bunch of corners yet, and
different fuzzing approaches tend to uncover different sets of bugs.

Therefore, by making a bunch of different just-good-enough fuzzers, we will repeatedly discover new, unique low-hanging fruit bugs.

Additionally, this gives us a nice foundation that we can spring board off of in the future when we decide to go deeper in any particular direction.

Decouple Generators and Oracles

A generator creates test cases (usually given an RNG or a random byte stream input). An oracle determines if executing a test case uncovered a bug. In general, it is good software engineering to separate concerns, but separating these two parts specifically allows us to:

reuse oracles during automatic test case reduction (a la creduce), and
swap out existing, off-the-shelf generators with more intelligent, custom generators the future.

Implementation

In general, I recommend that we use libFuzzer to drive our fuzzing. It is coverage-guided, which means it can find interesting code paths more quickly than testing purely random inputs will. It also has a nice Rust interface in the form of cargo-fuzz.

Any custom generators we create should take libFuzzer-provided input bytes and then re-interpret that as a sequence of random values to drive choices inside the generator. This lets us combine the benefits of smart, structure-aware generators with those of coverage-guided fuzzing. We can implement this by implementing our custom generators in terms of the arbitrary crate's Arbitrary trait.

As far as test case reduction goes, when a generator is creating Wasm files, it should be relatively easy to use binaryen's wasm-reduce on the Wasm file, or use creduce on the WAT disassembly. We can, however, do some small things to make the process turnkey:

[ ] Write glue scripts for running wasm-reduce and/or creduce on a Wasm test case with any of our various oracles

For generators that are creating custom in-memory data structures by implementing the Arbitrary trait, test case reduction requires we implement some custom logic. The Arbitrary trait supports defining a custom shrink method that takes &self and returns an iterator of smaller instances of Self. We can use this to create custom test case reduction for each of our custom test case generators.

Finally, any custom generator we create (and any generator we wrap that supports turning the generation of individual test case features on/off) should support swarm testing. Swarm testing is where we randomly turn on/off the generation of various test case features (such as, should a generator create Wasm test cases that use call_indirect or not?) so that we are more likely to generate pathological test cases where bugs are more likely to be found. This is relatively easy implement and should yield

Fuzzing Wasmtime's Embedding API

This is a case where, unfortunately, we can't really use existing off-the-shelf solutions.

Generators

[x] Build a custom generator that creates a sequence of API calls. It shouldn't perform the calls, just describe them. This generator should have some smarts about knowing how to generate valid API calls.

Oracles

[x] Interpret API call descriptions and perform the actual API call. Find unexpected panics, assertion failures, and segfaults.

Wasm Execution Fuzzing

We should fuzz our execution of Wasm. Yes, Cranelift has some fuzzing in SpiderMonkey, but we should also make sure that all of our Wasmtime-specific JIT'ing machinery is well fuzzed, as well as our WASI implementation and sandboxing.

Generators

[x] Use wasm-opt -ttf to generate random, valid Wasm files.
[ ] Write a custom generator that creates Wasm files that make sequences of WASI syscalls.

Oracles

[ ] Execute the file and ensure Wasmtime doesn't panic, fail any assert!(..)s, or segfault regardless if executing the Wasm traps.
[ ] strace the process or something and ensure it doesn't do any syscalls outside the preopened directory given to the WASI sandbox or something?
[x] Differential fuzzing where we compare the observable results of execution between:
- [x] Cranelift without optimizations
- [x] Cranelift with opt=speed
- [x] Cranelift with opt=size
- [x] Cranelift with opt=speed_and_size
- [ ] Cranelift with a warm code cache
- [ ] Cranelift with a cold code cache
- [x] Lightbeam

More Stuff to Explore in the Future

Add support for code-coverage in Cranelift and leverage it to build equivalence-module-inputs testing and coverage-guided fuzzing for Wasmtime
- Alternatively, we could MacGyver some custom code coverage scheme via instrumenting Wasm files with Walrus instead of doing this inside Cranelift at the clif level.
Create test case generators and oracles for our Wasm interface types support? What would be involved here is not super clear to me yet.

Questions

Should the fuzzing corpus be committed into the git repo? Or perhaps should it be a separate repo that we include as a git submodule?
What work here should we prioritize?
- In particular, what variants would be most valuable to compare / most likely to uncover high-priority bugs in differential fuzzing of Wasm execution?
Is there anything here you think we should not implement?
Are there any other WASI-targeted oracles we can create? The strace idea is pretty half-baked right now. I'd appreciate some more ideas from folks more involved in the WASI side of things than I am...

acfoltzer commented 4 years ago

In Lucet, I wrote a simple fuzzing script that uses Csmith-generated C programs: https://github.com/bytecodealliance/lucet/blob/master/lucet-wasi-fuzz/src/main.rs

The approach is to run each program via Lucet on WASI:

.c -[wasm32-wasi-clang]-> .wasm -[lucetc]-> .so -[lucet-wasi]-> stdout

Then compare the stdout against a native oracle:

.c -[i686-linux-clang]-> a.out -[exec]-> stdout

It's pretty bare-bones, other than the ability to run a creduce loop when a failure is found, but it should be possible to hook it up to libfuzzer and wasmtime.

kubkon commented 4 years ago

@fitzgen I haven't done a lot of fuzzing in the past, but I'll be more than happy to learn on the job and help out any way I can in testing out our WASI implementation. @acfoltzer lemme know if you need any help in potentially reusing your Lucet fuzzing harness in Wasmtime!

alexcrichton commented 4 years ago

For interface types specifically I suspect that the generator won't be too too different than what wasm generator we might have, unless we heavily base it on wasm-opt in which case we'd have to write our own fuzz case generator.

For an oracle I think our best bet will be to have someone entirely disconnected from the wasmtime interface types work to write an interpreter, and then we'd compare the two implementations against each other. I suspect we'd discover bugs in both, but I don't think we have much of an oracle otherwise right now.

acfoltzer commented 4 years ago

@acfoltzer lemme know if you need any help in potentially reusing your Lucet fuzzing harness in Wasmtime!

Thanks, @kubkon! I'm actually not going to have time to work on this for a few weeks at least, so if you're feeling eager, don't worry about jumping in and pinging me if you need any support.

sunfishcode commented 4 years ago

On the topic of oracles, the strace idea is appealing, as it doesn't require admin privileges and doesn't depend on cooperation from the VM. Ideally we'd write our own ptrace utility rather than literally using strace, so that we can catch sandbox violations when they happen, which protects the host system better and gives fuzzers a better picture of what's happening.

Another option is to use LD_PRELOAD to interpose between the application and libc, which ought to be faster than ptrace, and simpler to implement, though it would depend on applications being dynamically linked to libc.

fitzgen commented 4 years ago

Good point that there are a few different tools we have at our disposal to observe syscalls. There is probably some eBPF APIs and perf tools we could use too.

I would lean towards whatever is both

easy to implement, and
doesn't require us to blacklist each individual syscall, but instead lets us whitelist things we don't care about (that is, the default should be that we are checking things, without us having to do O(N) work to observe N different kinds of syscalls)

Unless I'm mistaken, LD_PRELOAD wouldn't work well for the latter, since we would have to manually implement overwriting a symbol for every libc API we wanted to observe.

jfoote commented 4 years ago

Use wasm-opt -ttf to generate random, valid Wasm files.

This is a good idea. We did something similar for a cranelift fuzz target. One downside to this approach is that the fuzz target cannot be seeded with a distilled corpus of valid-ish Wasm modules (since the input is a bitstring). Likewise, corpuses that are accumulated as fuzzers run will not be readily recyclable between generators that consume bitstrings (IIUC).

These are not good reasons not to take this approach, but something to consider for future work. Overall this looks great. I like the equivalence checking idea.

fitzgen commented 4 years ago

Should the fuzzing corpus be committed into the git repo? Or perhaps should it be a separate repo that we include as a git submodule?

FYI, some discussion about this over here: https://github.com/rust-fuzz/cargo-fuzz/issues/194

pventuzelo commented 4 years ago

Hi guys, i'm planning to do some fuzzing on lightbeam in the next weeks ;)

Just to give you a bit of context about me, I'm the guy behind webassembly-security.com and I'm teaching WebAssembly security and Rust security. I'm focused on fuzzing and vulnerability research on both WebAssembly (module & VM) and Rust code, so don't hesitate to ping me if needed ;)

I agree with @jfoote, regarding using binaryen translate_to_fuzz for fuzzing. Main issue will be crash replay because I think (need to be verify) binaryen is not consistent on wasm generation (i.e. same input can generate 2 different module). Also, generated wasm section are often the same in the final wasm module, meaning some part of the VM/parser will be difficult to reach

@fitzgen Regarding where to store fuzzing corpus, i would suggest a specific repo or server not link to this one to prevent user to download all those files accidentally. Also, corpus need to be minimize before being pushed in this storage repo.

In general, you should have one fuzz target per APIs and per backends since corpus will evolved differently depending of the code triggered.

fitzgen commented 4 years ago

binaryen is not consistent on wasm generation (i.e. same input can generate 2 different module).

wasm-opt -ttf will generate the same output given the same input; it is deterministic.

fitzgen commented 4 years ago

I've set up a repo for the libFuzzer corpora here: https://github.com/bytecodealliance/wasmtime-libfuzzer-corpus

pventuzelo commented 4 years ago

binaryen is not consistent on wasm generation (i.e. same input can generate 2 different module).

wasm-opt -ttf will generate the same output given the same input; it is deterministic.

Right ;)

Regarding the libfuzzer corpus, have you evaluate the actual code coverage?

fitzgen commented 4 years ago

Right ;)

Right.

I've never seen the same input to wasm-opt -ttf generate different outputs. There may be bugs somewhere, but I've never hit them. If you know of bugs, I'm sure that they would love to have bug reports.

Regarding the libfuzzer corpus, have you evaluate the actual code coverage?

I have not. So far, I haven't been focused on doing the fuzzing itself so much as setting up the infrastructure, implementing oracles, etc.

jfoote commented 4 years ago

Hello all. I looked into using oss-fuzz for continuous fuzzing of libFuzzer/cargo fuzz/libfuzzer-sys fuzz targets. oss-fuzz is appealing since it supplies significant free computational resources for fuzzing open source projects, supplies a private bug tracker and surrounding policy/process for coordination, provides a useful source code coverage mapping web UI, etc.

Here are my notes on a basic few gaps that would need to be addressed to integrate with oss-fuzz as-is:

builds should use oss-fuzz-supplied feedback-coverage instrumentation flags
- oss-fuzz supplies CC/CXX coverage instrumentation for building fuzz targets via CFLAGS/CXXFLAGS, e.g. -fsanitize=fuzzer and -fsanitizer=fuzzer-no-link (ref)
- cargo fuzz uses a statically defined set of flags. These may be compatible with the set used by oss-fuzz/-fsanitize=fuzzer today, I did not check
builds must use oss-fuzz-supplied sanitizer instrumentation
- cargo fuzz and libfuzzer-sys support the ASAN and UBSAN, the default sanitizers used by oss-fuzz. The syntax for passing them via the command line varies ofc.
builds should statically link oss-fuzz's version of libfuzzer (i.e. libFuzzingEngine) into the fuzz target
- libfuzzer-sys uses a vendored copy of libfuzzer that is updated periodically by the maintainers
builds must statically link the fuzz target binary to copy out to clusterfuzz
- also, the binary must support libfuzzer-compatible command flags
- cargo fuzz already builds a standalone fuzz target binary as part of cargo fuzz run, but building does not appear to be exposed as a standalone step
builds should support clang coverage builds
- oss-fuzz uses clang source-based coverage to generate precise coverage data for its source code coverage-mapping UI
- there is discussion of supporting an analogous feature in Rust, with some recent activity

Last year there was some discussion in the oss-fuzz project of supporting Rust targets directly, where a maintainer (kcc) mentioned deviating from the norm and not supporting coverage builds, etc. If we want to pursue oss-fuzz for fuzzing Rust targets directly we could engage with the team to see if we might be able to do something less than ideal to get started, or if they are planning to change the interface to support cargo fuzz/libfuzzer-sys fuzz targets. There are alternatives to oss-fuzz available as well.

This seemed like the right place to share and discuss this; if I am off-topic here just let me know (and please pardon me!).

fitzgen commented 4 years ago

Thanks for looking into this @jfoote!!

There are a couple projects already that use cargo in their build.sh so I suspect that we can make something work here.

builds should statically link oss-fuzz's version of libfuzzer (i.e. libFuzzingEngine) into the fuzz target

libfuzzer-sys uses a vendored copy of libfuzzer that is updated periodically by the maintainers

I think we can work around this in the build.sh via

export CUSTOM_LIBFUZZER_PATH="$LIB_FUZZING_ENGINE"

See https://github.com/rust-fuzz/libfuzzer-sys/blob/master/build.rs#L2 for details.

cargo fuzz already builds a standalone fuzz target binary as part of cargo fuzz run, but building does not appear to be exposed as a standalone step

Yep, we should fix this issue by adding a new build subcommand to cargo fuzz. In fact, it is something that's been asked for before: https://github.com/rust-fuzz/cargo-fuzz/issues/175

Overall, for our next steps, I think it makes sense to

add a build subcommand to cargo fuzz (see above), and then
get a docker image, project.yaml, and build.sh set up for oss-fuzz that works but maybe doesn't exactly check all the boxes due to them not having a lot of Rust projects, and finally
open a PR to oss-fuzz with a disclaimer of what boxes aren't fully checked and why, opening the discussion up with them.

Sound like a plan?

I can take the first bullet point, and also continue working on the other bits mentioned in this issue. Can you take on the last two bullet points @jfoote?

jfoote commented 4 years ago

There are a couple projects already that use cargo in their build.sh so I suspect that we can make something work here.

At first blush my sense was these projects might be using cargo to build non-instrumented dependencies that are linked into the fuzz targets. I didn't dive into them though.

I think we can work around this in the build.sh via export CUSTOM_LIBFUZZER_PATH

Excellent, TIL.

Sound like a plan?

SGTM. Even if the compile/instrumentation flags are not passed as expected I think a basic PR will be a good way to get the conversation started with the oss-fuzz team.

Can you take on the last two bullet points @jfoote?

Sure thing. I am in a pre-US-holiday crunch right now so there might be a little delay, but I will get to this ASAP.

fitzgen commented 4 years ago

Great -- thanks! I don't think there is any giant rush here, so if this gets bumped to after the holidays, that seems 100% OK with me :)

fitzgen commented 4 years ago

Yep, we should fix this issue by adding a new build subcommand to cargo fuzz. In fact, it is something that's been asked for before: rust-fuzz/cargo-fuzz#175

This is done, and part of the new cargo fuzz 0.6.0 release.

jfoote commented 4 years ago

Quick update here: I was able to link the oss-fuzz build environment libfuzzer library (libFuzzingEngine.a) into the wasmtime/fuzz compile fuzz target after patching rust-fuzz/libfuzzer to select c++ std lib based on an env var. Executing the binary for a few seconds yields the expected results; it seems to work.

Building with asan (the default) is OK, but specifying sanitizer=memory yields a linking error. I fiddled with the bug a little and suspect an incompatibility in the instrumenting/linking used in libFuzzingEngine.a and what rustc/libfuzzer-sys are using, but I did not root-cause it.

The other sanitizer that oss-fuzz can optionally build with is ubsan, but it is not supported by our toolchain here at this time AFAIK.

My recommendation (and plan at this point, unless directed otherwise) is to ignore the sanitizer flag supplied by oss-fuzz, set the fuzz target configs to use only asan for good measure, and proceed to write a build script for the wasmtime/fuzz targets. I'll then make a PR to oss-fuzz after https://github.com/rust-fuzz/libfuzzer/pull/56 is merged to get the conversation started.

jfoote commented 4 years ago

Hello @fitzgen! I have the strawman PR for the wasmtime oss-fuzz integration staged. Before we move forward with that, can you take a look at the project acceptance PR diff (https://github.com/jfoote/oss-fuzz/commit/06542db5f3f4e8a37807652dc17f62dba05b2d82) and see if it looks OK to you?

Basically I set myself as the maintainer for now and added an email alias for you as well as security@bytecodealliance.org. Those addresses are used to get notifications when the fuzzers find something or the build breaks. Note that if aliases listed there have associated google accounts they will get access to the oss-fuzz dashboard and bug tracker. Should we add anyone else initially?

I have the strawman integration PR WIP staged here: https://github.com/jfoote/oss-fuzz/commit/c1ae8eafb4e6067b7d9660cd200e7a2b44b6657c
- Once https://github.com/bytecodealliance/wasmtime/pull/840 lands I'll change the wasmtime clone back to upstream
And here is a draft of the text I plan to include with the initial project acceptance PR once we have it settled (note for onlookers that I may delete this gist later/after we submit the PR)

fitzgen commented 4 years ago

@jfoote looks great! :+1: I left a couple comments on the draft text. Everything else looks ready to go!

jfoote commented 4 years ago

Quick update for posterity and onlookers: we've successfully integrated the wasmtime fuzz targets with oss-fuzz, with the caveats outlined in the comments and referenced PRs above. Thanks to @fitzgen and @alexcrichton for making this happen!

Hyperion101010 commented 4 years ago

@fitzgen sir this was a gsoc2020 project idea, I worked in the application period and submitted a proposal. Given the time I had at I hand i wasn't able to get complete idea about the different vulnerabilities like ABI abstractions, Heap and Stack safety. I want to voluntarily contribute for the idea, but couldn't do the same before I clear out some doubts. I would like to start understanding the fuzzing process more closely and contributing by writing fuzzers perhaps. During the application process I wrote mails for the project details, but I never got any reply which is completely fine given the situation we have now. Is there any way we can do a conversation for the doubts I have, I see that there used to be a IRC channel for wasmtime one year ago, but now they migrated to Matrix which unfortunately doesn't has any such channel. If you are available on any channel of Mozilla/(other open source org) please let me know. Good day!

bjorn3 commented 4 years ago

https://bytecodealliance.zulipchat.com/ is the primary discussion channel.

bjorn3 commented 3 years ago

I think this can be closed.

bytecodealliance / wasmtime