Open Ekleog opened 2 years ago
To expand motivation here a bit: our typed value is WebAssembly, and we use arbitrary+wasmsmith combo to generate that, where this basically means that we take unstructured Vec<u8>
from bolero and use it as a sort of rng to get a syntactically valid wasm.
With the current infrastructure, bolero wants to persist this seed as a failure test case. It would be cool if we could save the actual wasm instead:
Just a bit more brain-storming about the potential benefits of this feature:
For use-cases where you don't want to check in a corpus, you could seed the corpus with specific concrete values from code. This could allow people to build a "stable corpus" of interesting inputs, resilient to changes in how the bytes get interpreted.
Again, for seeding a corpus while checking in even less, if you have interesting type-directed strategies for generating inputs (for property testing), you could maybe seed a fuzzing corpus from these. It would be interesting to see if this worked, actually... and I wonder if there's any prior art here
Another thought on the potential benefits of this feature:
In the situation where you're running a 24/7 fuzzing server, you want to be able to pause fuzzing, update to a new commit, and resume. But the way Arbitrary/Generators tend to work here, updating is "fragile" and likely to lead to the existing corpus of inputs to fail to parse or parse differently (losing value).
With this feature, you could pause fuzzing, "reverse-generate" the corpus to a "stable corpus", update to the new commit for the software under test, "forward-generate" the stable corpus back to a fuzzing corpus, and resume fuzzing without losing the progress previously made.
I agree; I think this would be a really powerful feature to have. If someone wants to write up a proposal of how it would fit with everything, that's probably the next step here.
The more I think about it, the harder I think this is. For instance, when using a Rng driver, then this feature is essentially impossible to implement.
Basically, the best idea I could think of would be to compile a fuzzer that runs assert_ne!(input, WantedValue { val: 0 })
until it crashes, and then the crash lets one know what happened.
But well… especially for migrating whole corpuses, this would be pretty bad, as it'd mean running the fuzzer for a long time just to find what inputs were generated before.
The only thing I think could make sense to do this, would be to preserve crashes, and not the whole corpus: this way regression tests stay alive. But well, it's probably not worth the effort writing all the infrastructure to make that work. For this specific scenario it'd probably be better to have a cargo-bolero command that automatically generates a regression test from the crash's Debug output or similar. (Yes such a solution would often fail, but actually running the fuzzer to generate data that gens the same input as before an update would also hit the same issues)
For instance, when using a Rng driver, then this feature is essentially impossible to implement.
I don't quite follow. Certainly you have a point that it could be difficult to reverse a generator, if you think of them as black-boxes, but I think the point here is that the generators aren't black boxes. In fact, many of them are automatically derived with derive
clauses. (And I think this is an easy way to de-risk your concern: if we can automatically generate reverse-generators for every type supported by the derive
clause, we're probably golden for a lot of use cases, even if there do exist generators out there that can't easily be inverted.)
For example, if a generator says (in vague rust-y pseudo-code):
if bytes.pop() <= 127 { None } else { Some(generate()) }
Then we know how to write a reverse:
if let Some(x) = input { bytes.push(128); antigenerate(x) } else { bytes.push(0); }
I wanted to write up what I think this feature might look like from the perspective of the use case I have in mind.
A normal bolero harness looks something like this (from the readme):
#[test]
fn fuzz_add() {
bolero::check!()
.with_type()
.cloned()
.for_each(|(a, b)| buggy_add(a, b) == a.wrapping_add(b));
}
I’m imagining this feature adding something where you can do:
#[test]
fn fuzz_add() {
bolero::check!()
.with_type()
.cloned()
.with_examples(&[(1,1), (0, u32::MAX)]) // <== This line
.for_each(|(a, b)| buggy_add(a, b) == a.wrapping_add(b));
}
(Or, to use the property test generators, instead of concrete values, a .with_generated_examples()
. Or perhaps .with_generated_examples(bound)
. Calling both methods, and/or calling with_examples
multiple times should be ok.)
The semantics would be to take each example and backwards-generate bytes, and to emit each of those examples as files in the corpus as a step before invoking the fuzzing engine. (For with_examples
we could also check each one during property testing, too.)
The problem this would solve is one faced by structural fuzz harnesses (i.e. ones accepting an input of a generator type, not just flat bytes as input). Because the "parsing" step is not stable, we can't really rely on having a corpus that remains useful. If anything about the type changes (new field, new variant of the enum, or any other code change), then the “old” bytes become useless. They no longer parse, or they don’t parse as the thing you wanted (to get the coverage you wanted from the fuzzer).
With this feature, you can have the stable corpus written as examples in code, so it’s maintained normally, and it uses this “reverse” feature to emit it to the files into the corpus before starting the fuzzer. If the “parsing” changes, the reversed bytes change in tandem, and things stay stable. (Or, maybe you don't even need to maintain a corpus, because the property test generator can generate a seed one for you!)
I'm completely with you about the advantages of having an anti-generator, as it's actually the reason why I opened this feature idea :sweat_smile: But there are three fundamental issues with the idea of an anti-generator.
First is, it just cannot work with the rng-based drivers (used by the proptest generator). (It can't work with kani either but I guess this would be expected). Maybe not too big a deal as the proptest generator could just first run the examples and then start actually generating values, like it does with the fuzzer corpus.
Second, and more deep one: even for the byte slice driver, used by fuzzers, it is very possible that the value for which you're trying to generate a byte slice just cannot be generated by the driver. For instance, if there's a vec that gets generated with length 0..100
, what should happen when you try to anti-generate a value of length 101? It literally cannot find a result. So a choice needs to be made here:
.with_examples
, it should probably refuse running the fuzzer?And then there is the third and maybe most important thing. What should happen for the things that are just hard to anti-generate? Anything that is currently being derived without attributes can probably be anti-generated, with enough work put into the macro work (right now it would seem relatively ok for DriverMode::Direct ByteSliceDrivers, but the code is missing depth handling, which breaks generating recursive types). But not everything can be derived, like we already have stuff that we need to generate from arbitrary
because bolero::TypeGenerator is not popular enough yet to be implemented in upstream crates (thinking of chrono types as I'm dealing with them in particular). What should the API look like for anti-generation? How can users write manual implementations of AntiGenerator? What should the behavior be when using a gen_with()
attribute?
It's a huge amount of work, even if ignoring the issues of anything that relies on a non-bolero generator; hence my previous message. Now if you feel ready to try tackling this problem, it'd be awesome!
(Also, FWIW, in practice adding a variant to an enum does not seem to usually break the corpus in my experience, though I have not tried actually verifying it)
First is, it just cannot work with the rng-based drivers (used by the proptest generator). Maybe not too big a deal as the proptest generator could just first run the examples and then start actually generating values, like it does with the fuzzer corpus.
Ah we were maybe talking past each other, because I think we agree on this. I don't think of this as a problem because anti-generation seems like a fuzzing-only (or at least, "dealing with corpuses"-only) feature to me.
it is very possible that the value just cannot be generated by the driver
Agreed, failure is an option. I think failure can be usefully and effectively "contained" though.
For instance, if you have to use e.g. Arbitrary
instances in your generator, that seems like an acceptable failure mode for the feature. (I fully understand that might not be acceptable for you of course, if that's the use case you have at hand! Just a sec...) As long as it looks like something that would work for everything derive
would work on, I think it's at least useful for many purposes already.
For constraints like the length limit, I think that'll generally not be a problem. You typically wouldn't get a value that violates those constraints in the first place. (e.g. the fuzzer wouldn't ever generate one)
If the user manually writes a value that violates constraints, I'd suggest the most useful (and safest initial) semantics is actually that it'd error out eagerly. (e.g. cargo test
would complain immediately.)
(I do wonder if the failure error type might be able to return a BlackBoxProblem
result though, if you wanted to try something fancy in those cases.)
What should the API look like for anti-generation? How can users write manual implementations of AntiGenerator? What should the behavior be when using a gen_with() attribute?
All good questions. I agree we'd want to see an anti-generator be just as adaptable as generators are at dealing with ad-hoc additions. It might be unpleasant (and perhaps unstable) to figure out how to write an anti-generator for a type from another crate, but if that capability doesn't presently exist, then someone's got to do it...
Or perhaps the capability will appear in Arbitrary someday as well: https://github.com/rust-fuzz/arbitrary/issues/44
That'd be cool too! FWIW the issues listed in https://github.com/rust-fuzz/arbitrary/pull/94 could be helped by https://github.com/camshaft/bolero/issues/90 ; as such a generator might be much more easy to write anti-generators for than TLV-based ones, because the API encodes much more information :)
But now #90 also has its own questions of API (feel free to comment there too!), so that's not a silver bullet (yet) :sweat_smile: (this, plus implementing this feature even after #90 is done will still be a quite complex project)
Overall, for anti-generation, I think that as soon as the call graph of generate()
implementations depends on one of the driver outputs, it'll become basically impossible to anti-generate. What #90 brings is, hopefully the call graph won't need to change for derive macros at least.
Actually this makes me think maybe the whole rng
API could be refactored so that independence from driver outputs is guaranteed without reducing the scope of possibility? With such an API, maybe we could actually automatically implement AntiGenerator for literally every generator. But then the API itself might be quite complex… basically, it'd need to define a graph of what should go where and then the implementations of Generator and AntiGenerator could just walk that graph. But then not every input format becomes possible (eg. TLV sequences are no longer possible), so it probably couldn't replace Generator, and probably would not be worth it.
Oh and another problem that just came to mind: if a generator can mutate()
a value to a value that cannot be generate()
d, then we have a problem too. But it's probably not a problem with derived impls anyway.
This may be a far-fetched idea, but I'm just dropping it here in case it'd make sense: what would you think about the idea of making an "anti-generator" function in the generator traits, that'd look like:
Then, one could use this
generate_test_case
funciton so as to build a crash fuzzer test from a known-bad value of that type.TBH I'm not sure it's actually worth the hassle (as theoretically just running the fuzzer should find the input automagically), but I wanted to drop this idea so we could discuss it :)