blt / bughunt-rust

QuickCheck bug hunting in Rust standard library data structures
MIT License
160 stars 8 forks source link

Thoughts on README in no particular order #2

Open Shnatsel opened 6 years ago

Shnatsel commented 6 years ago

Opening an issue because github doesn't provide a better communication mechanism. Hopefully some of these points are actionable.

Downside is, the std data structures we're testing don't have any sanitizers turned on etc on account of the project is run against the usual Rust releases.

libdislocator can catch many (but not all) memory errors for black-box binaries, assuming they use the system allocator.

Rust sanitizer integration lists steps to rebuild stdlib with sanitizer support, but calls it "better backtraces". We might want to get in touch and check how exactly sanitizers can be enabled for stdlib primitives.

This project will happily take fuzz code. Or, if someone could contrive to combine a fuzzer with a shrinking step this project will jump on that in a heartbeat.

Both AFL and libfuzzer shrink the inputs as they go along. They have Rust integration as afl.rs and cargo-fuzz respectively. Both have a one-shot testcase minimization command, which doesn't usually do a great job. However, AFL also comes with a crash exploration mode lets you explore alternative crashing inputs and minimize the testcase to your heart's content.

Also, Rust QuickCheck README mentions proptest as an alternative to QuickCheck that improves on shrinking and links to some comparisons of the two.

Eh2406 commented 6 years ago

I was not sure how to give this feedback, but now that we have a issue open for random feedback... A big +1 for proptest!

Shnatsel commented 6 years ago

@Eh2406 could you elaborate on what advantages it has over QuickCheck?

blt commented 6 years ago

Opening an issue because github doesn't provide a better communication mechanism.

@Shnatsel This works for me.

libdislocator can catch many (but not all) memory errors for black-box binaries, assuming they use the system allocator.

Ah, now there's a thought. I'm not sure what allocator is built-in to the published releases but, at the same time, I'm also not sure how far this project will get without making custom builds of the standard library itself.

Rust sanitizer integration lists steps to rebuild stdlib with sanitizer support, but calls it "better backtraces". We might want to get in touch and check how exactly sanitizers can be enabled for stdlib primitives.

Agreed. I'll put together some better labels for the project and make an 'investigation' ticket out of this.

This project will happily take fuzz code. Or, if someone could contrive to combine a fuzzer with a shrinking step this project will jump on that in a heartbeat.

Both AFL and libfuzzer shrink the inputs as they go along. They have Rust integration as afl.rs and cargo-fuzz respectively. Both have a one-shot testcase minimization command, which doesn't usually do a great job. However, AFL also comes with a crash exploration mode lets you explore alternative crashing inputs and minimize the testcase to your heart's content.

I dashed this comment off over lunch and should have put more thought into it. What I had in mind is a hybrid of fuzzing and QuickCheck. In the 'exploration' phase the tool acts as a fuzzer, exploring the branch space et al of the system under test. On error, the tool enters a 'shrink' phase, using the shrink on any generated Arbitrary instances to find a smaller example set. In 'exploration' phase the tool works in terms of a big byte blob, in 'shrink' phase it works in terms of types.

Speaking of which, I need to define shrink for Op<K,V> in the hashmap model.

Also, Rust QuickCheck README mentions proptest as an alternative to QuickCheck that improves on shrinking and links to some comparisons of the two.

You know, I've never experimented much with proptest in Rust. I have used hypothesis in Python to good effect. Effective shrinking has been less of a problem than fast generation, at least in my experience QC'ing Rust projects. That said, @Eh2406 I'd be very interested to see some work done in proptest, if you had time to donate. I'm not married to QuickCheck.

Somewhat as an aside, these are features I think would be extremely valuable in a QC tool for this project:

Eh2406 commented 6 years ago

So I am not experienced enough with this stuff to really speak from authority. The linked discussion express the soundbites I have heard about QuickCheck/Proptest better than I can. My enthusiasm comes from having repeatedly read the documentation for both and having some sense of how to get started with Proptest. To quote @AltSysrq "To be honest, I've always had a "Am I holding it wrong?" sort of feeling with quickcheck" Edit: I here that the interop betean the two is getting better soon. So if QuickCheck makes more sense to you, then we may soon be able to mix and match.

collection of property violations on-disk, allowing for all-night runs without stopping

proptest has the saving to disk part. https://docs.rs/proptest/*/proptest/#failure-persistence The not stopping could probably be done with a wrapper script or a PR would probably be considered.

Shnatsel commented 6 years ago

I've experimented with this a bit. Turns out the allocator used for stdlib Vec is whatever your binary is using. My experiments were geared towards detecting uses of uninitialized memory, but they should be informative for this project as well.

https://gist.github.com/Shnatsel/0c024a51b64c6e0b6c6e66f991904816 - this is a trivial testcase with obvious use of uninitialized memory.

I have tested three configurations, all in release mode: jemalloc (i.e. Rust's default), system allocator, and system allocator + libdislocator injected via LD_PRELOAD.

Turns out in release mode with system allocator the values inside the vector vary between runs of the entire process, but are always the same until you restart the process. With jemalloc and libdislocator they do not vary at all.

I have also made a patch for libdislocator that makes it clobber every newly allocated buffer with a different value. This allows detecting use of uninitialized memory simply by running an operation twice in the same process and comparing the output.

Shnatsel commented 6 years ago

For running a job for a specific time there's the Unix command timeout. Just set an absurdly high number of iterations and run timeout 8h cargo test to run QuickCheck for 8 hours.

Eh2406 commented 6 years ago

Just FYI, proptest is working on new functionality for testing data structures: https://github.com/AltSysrq/proptest/pull/84

blt commented 6 years ago

For running a job for a specific time there's the Unix command timeout. Just set an absurdly high number of iterations and run timeout 8h cargo test to run QuickCheck for 8 hours.

Hey now, that's useful!

blt commented 6 years ago

Just FYI, proptest is working on new functionality for testing data structures: AltSysrq/proptest#84

@Eh2406 interesting. We're using a stateful approach already, building up a vector of operations and interpreting them against the data structure and its model. I'd be real interested to see a re-implementation of what we have in hash_map.rs compared to the stateful work in proptest.

Shnatsel commented 6 years ago

The way I see a hybrid of a feedback-driven fuzzer and property testing is that the feedback-driven fuzzer is used to generate lots of inputs really fast and records the ones that have resulted in new execution paths. AFL achieves at about a billion executions a day on a single machine for reasonably fast targets (such as decoding ~1kb images or audio files). The resulting corpus of interesting inputs can be then fed to actual property testing to discover correctness issues. Absence of memory errors could be also checked by the fuzzer seeded with the generated corpus and combined with address sanitizer or libdislocate.

The only tricky part I can think of is representing series of operations in a fuzzer-accessible way and making that dense enough to not interfere with useful mutations and simple/stupid enough not to be a source of bugs by itself. Which actually should not be that hard.

I might or might not get around to experimenting with this, so feel free to run with the idea.

blt commented 6 years ago

The way I see a hybrid of a feedback-driven fuzzer and property testing is that the feedback-driven fuzzer is used to generate lots of inputs really fast and records the ones that have resulted in new execution paths. AFL achieves at about a billion executions a day on a single machine for reasonably fast targets (such as decoding ~1kb images or audio files). The resulting corpus of interesting inputs can be then fed to actual property testing to discover correctness issues. Absence of memory errors could be also checked by the fuzzer seeded with the generated corpus and combined with address sanitizer or libdislocate.

This is pretty well what I have in mind, which is a good sign for the idea, I think.

The only tricky part I can think of is representing series of operations in a fuzzer-accessible way and making that dense enough to not interfere with useful mutations and simple/stupid enough not to be a source of bugs by itself. Which actually should not be that hard.

Agreed. This fuzz target implements my best thoughts on this idea right now. The Arbitrary instance is generated from the fuzzer's blob directly. That is, the fuzzer side of things only has to understand execution paths and the QuickCheck side of things generating instances of Arbitrary, performing shrinking when directed.

I might or might not get around to experimenting with this, so feel free to run with the idea.

It's a toy that's on deep back-burner for me. Please feel free.

gilescope commented 6 years ago

I think these kind of efforts are great to test HashMap, but wouldn't it be awesome if we could test every Map?

The eclectic crate has traits for collections (as num_traits does for numbers) and I've been playing around with getting the standard library tests to run against the traits (E.g. https://github.com/gilescope/eclectic/blob/master/src/tests.rs ). (It's the idea not the crummy implementation of trait_tests that matters)

There's a lot of if / but / maybies, but I would really like a world where I could rock up with my own implementation of a hashmap and be able to run all the standard (and security) tests for free.

Wouldn't that be a safer world?

blt commented 6 years ago

@gilescope eclectic is an exciting project. Some of the QC generators will be a little odd with an aim to trigger known bad states but I for sure see a lot of utility in eclectic shipping with property testing.

Shnatsel commented 6 years ago

https://www.reddit.com/r/rust/comments/9a6qf5/blog_easy_proc_macro_derives_with_synstructure/ this lets you derive Arbitrary for custom structures, might be relevant to this project. Nevermind, that's #6

blt commented 5 years ago

I'm going to close this conversation out. Thanks everyone!

Eh2406 commented 5 years ago

hybrid of fuzzing and QuickCheck.

proptest just added the ability to pass through a source of randomness. I think this would allow generators written for proptest to be run via an external fuzzer. I don't know if it is useful but if it is not, I'd love to here why not or how it could be improved.

blt commented 5 years ago

Oh neat, cool. I’ll take a look at this and be back in touch in a few days.

On Mon, Feb 11, 2019, at 8:03 AM, Jacob Finkelman wrote:

hybrid of fuzzing and QuickCheck.

proptest just added https://github.com/AltSysrq/proptest/issues/117 the ability to pass through a source of randomness. I think this would allow generators written for proptest to be run via an external fuzzer. I don't know if it is useful but if it is not, I'd love to here why not or how it could be improved.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/blt/bughunt-rust/issues/2#issuecomment-462383251, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAN-1TpK90lnXgqewP1YNM2WTeI7LC5ks5vMZQygaJpZM4WGxIV.

Shnatsel commented 5 years ago

Thanks for pointing this out! I have emulated this with QuickCheck before, see https://gist.github.com/Shnatsel/4a907d44d6429de93d63d6e7c4d1361e

I've also prototyped a way to generate such fuzzing harnesses automatically based on parsing Rust source code. I've got a pass that parses Rust source code from a given file and extracts function names and types into an array of structs. It was trivial to implement based on syn visitor example. I don't have the generator part yet, but that also looks dead simple.

Sadly my employer would not let me release the code without a CLA attached, and I'm not thrilled about the prospect. If someone else kicks off the project and writes the same 20 LoC parser independently, I will be able to join in afterwards and we won't have to deal with copyright assignment.

Eh2406 commented 5 years ago

I think that the TestRunner will still support shrinking, even when made with pass through mode.

How much progress does someone need to make before you can contribute? Did you have a name in mind for the project?

Shnatsel commented 5 years ago

Here's what the code I've written at the company does:

  1. Extract function and method definitions from code parsed with syn based on syn visitor example, and put names and types from them into a collection of custom structs
  2. Pretty-print custom structs by implementing Display for them. This was mostly for debugging and was not producing code that would actually compile.

These are the only parts I've written and can't release without a CLA, so once that's squared away I should be able to contribute freely.

No clue about the name, sorry.

blt commented 5 years ago

@Eh2406 being able to pass through randomness from the fuzzer is extremely valuable. I'd be interested to see what happens when we adapt an existing test. I bet it will go well. I look to March to have time to do this myself.

There's a few tricks to running under a fuzzer. Most fuzzers aren't just a dumb pool of random bytes, they're also trying to intelligently manipulate that pool. So, if proptest is in charge of shrinking that might give the fuzzer an aberrant signal. What I was doing with blt/rqc was experiment with a fuzzer+QC where the fuzzer was responsible for shrinking by shrinking the byte pool, leaving the QC aspect a framework for coercing bytes and reporting on the result of said coercion. Please be aware that blt/rqc is extremely bare-bones, more a research spike than anything.

I don't know if proptest does this or not, but quickcheck will quit a test as soon as a failure has been found (and shrunk). For long runs, especially once #14 is a thing, we'll want to collect tests over the specified test run time. This is something that Quviq's EQC does and it's very valuable. Failure collection, categorization is common to fuzz tools and I was going to have the 'fuzz' side of rqc manage collection, categorization.

We need to be exit(0) safe. Because allocation failure results in panics -- modulo some recent work -- we prefer the use of blt/bh_alloc in our tests to cause an exit(0) on allocation failure, which requires a fuzzer that doesn't mind if the executable cleanly exits under it. I don't know how this interacts with proptest but I thought it worth calling out.

The Angora paper gave me the sense that a fuzz+QC integration would work very well if the QC side of things were responsible solely for coercing bytes into a set of test data, signaling if this failed, and the fuzzer were responsible for interpreting signals up from the QC side of things, shrinking / mutating bytes and collecting errors. You can see that notion embedded here in the way the QC test signals back to the runner what's happened for a given byte payload.

Eh2406 commented 5 years ago

So I'd imagine that we would disable shrinking, when running under a intelligent fuzz harness. Then have some kind of file persistence, ether proptest, or a hand coded, or one provided by the fuzzer. Then shrinking can be enabled only after the bug has been found.

As to the other issues, we are way past my experience.

blt commented 5 years ago

It's also worth noting that this project should be a collection of two kinds of tests:

I've spent most of my time so far on the last variety because there are, to my mind, some interesting questions on how to achieve that as an engineering effort. The first point was limited by infrastructure, neatly solved by clusterfuzz being open-sourced. Come early March I'm going to start focusing on increasing the scope for straight-ahead fuzzing, standing up a project clusterfuzz. @Shnatsel I'll take a crack at producing the harness code you've described at that time, if someone hasn't got there before me, at that time.

Shnatsel commented 5 years ago

Could you describe what you want to achieve with straight-ahead fuzz tests and how is it different from the current state of affairs? I'm not sure I follow.

Also, the project for automatically generating fuzzing harnesses from function types got kicked off at https://github.com/Eh2406/auto-fuzz-test and has independently reimplemented everything that I've coded while at the company, so I'm now able to contribute to it. This should get us fuzz coverage of libraries with large API surfaces for cheap.

blt commented 5 years ago

Could you describe what you want to achieve with straight-ahead fuzz tests and how is it different from the current state of affairs? I'm not sure I follow.

Sure thing. I did a talk at work about this very subject; I'll try and get it cleaned up and published here in the next couple of weeks. But, my basic argument is that the model-checking quickcheck tests are suitable for determining for some inputs:

Building the model and running it against the SUT is not cheap. If you're solely interested in the first point that's where a "straight-ahead" fuzz test shines; they don't spend any CPU time setting up a model and feeding it input. The hash map test aligns with my notion of a model-checking test, the str::repeat test aligns with my notion of a straight-ahead fuzz test (though there is a little bit of logic checking).

I've begun to think that there's probably more utility to be wrung out of establishing a base of fuzz tests and the infrastructure to run them well, then moving on to model-checking.

Shnatsel commented 5 years ago

https://github.com/rust-fuzz/targets provides a number of fuzzing targets already, although some are probably outdated. Also many projects have fuzzing harnesses in-tree, but not running on CI. https://github.com/Eh2406/auto-fuzz-test would also be of use.