Open mwhicks1 opened 1 year ago
The with_arbitrary
method is gated on the arbitrary
feature: https://github.com/camshaft/bolero/blob/b1b519e47b53c0a55d42c629b6d7b18f5b096726/bolero/src/lib.rs#L261-L262
We'll need to make it show up in docs and that it requires that feature.
But you can do that in your Cargo.toml:
bolero = { version = "0.9", features = ["arbitrary"] }
and then in the harness
#[derive(Arbitrary)]
struct MyType {
a: u8,
b: u8,
c: u8,
}
#[test]
fn my_test() {
bolero::check!()
.with_arbitrary::<MyType>()
.for_each(|value| { ... });
}
Thanks! I got that to work.
But: using Arbitrary
for my recursive data type seems to be way slower than using a TypeGenerator
. Support for the latter was added with the fix for #121, which shows a simple version of my example. Here's what I get with a derived Arbitrary
(2 exec/s):
INFO: seed corpus: files: 808 min: 1b max: 2706b total: 171738b rss: 52Mb
#8 pulse cov: 65 ft: 65 corp: 1/1b exec/s: 2 rss: 52Mb
#16 pulse cov: 65 ft: 65 corp: 1/1b exec/s: 2 rss: 52Mb
#32 pulse cov: 65 ft: 65 corp: 1/1b exec/s: 2 rss: 53Mb
#64 pulse cov: 65 ft: 65 corp: 1/1b exec/s: 2 rss: 53Mb
compared to this, with the derived TypeGenerator
(46k exec/s):
INFO: seed corpus: files: 662 min: 1b max: 2670b total: 127250b rss: 52Mb
#818 INITED cov: 281 ft: 1376 corp: 295/48Kb exec/s: 0 rss: 82Mb
#906 REDUCE cov: 281 ft: 1376 corp: 295/48Kb lim: 2670 exec/s: 0 rss: 82Mb L: 7/2670 MS: 1 EraseBytes-
#1086 NEW cov: 281 ft: 1377 corp: 296/51Kb lim: 2670 exec/s: 0 rss: 84Mb L: 2463/2670 MS: 4 ShuffleBytes-ChangeByte-InsertRepeatedBytes-InsertRepeatedBytes-
...
#184606 REDUCE cov: 285 ft: 1431 corp: 320/58Kb lim: 3858 exec/s: 46151 rss: 665Mb L: 2188/2706 MS: 4 PersAutoDict-CMP-CopyPart-EraseBytes- DE: "\020\000\000\000"-"\373\016\004\000"-
#185035 NEW cov: 285 ft: 1434 corp: 321/59Kb lim: 3858 exec/s: 46258 rss: 665Mb L: 1280/2706 MS: 3 CrossOver-CopyPart-ChangeBinInt-
^C==69109== libFuzzer: run interrupted; exiting
I can file another issue if that helps, but maybe it's expected for some reason?
Yeah the initial pass on the arbitrary integration is pretty unoptimal... https://github.com/camshaft/bolero/blob/b1b519e47b53c0a55d42c629b6d7b18f5b096726/bolero-generator/src/driver/macros.rs#L99.
It does a bunch of allocation and copying between the slices so it could be improved. That being said, 2 exec/s seems extremely low even with those things so there may be other issues.
Can you share what your recursive data type looks like and I can take a look to see if I can fix it.
After some digging it looks like we're running into https://github.com/rust-fuzz/arbitrary/issues/144. The size_hint is taking forever to compute. In #151 I have a change that caches the hint, which speeds things up from 2 to about 4000 iterations per second.
In #151, I've also added a depth guard to bolero-generator so we don't blow up the stack on recursive types. That does about 70,000 iterations per second, but looking at the flamegraph we're spending a lot more time inside libfuzzer. This likely means we're bailing a lot on inputs and libfuzzer isn't finding many that satisfy the constraints. I'm still digging in on this to confirm, though.
Ok so the difference between the two seems to be coming from the way arbitrary does depth guarding and how I implemented it in #151. Arbitrary only guards depth once the byte slice is empty (see https://github.com/rust-fuzz/arbitrary/pull/111) whereas I am guarding it all the time. As a result, the arbitrary-generated trees are much deeper. I think the "always guard" approach is a bit more predictable since it will always limit the tree depth to the configured limit in the harness. With the current approach in arbitrary, I was seeing trees with +150 levels, which may not be desirable for all harnesses.
In the case of both arbitrary and bolero-generator I am seeing memory grow and eventually hitting a OOM at 2GB. I need to figure out if this is bolero's issue or something to do with libfuzzer/ASAN.
Thanks for looking into this! Recursive generators are hard.
I think the "always guard" approach is a bit more predictable since it will always limit the tree depth to the configured limit in the harness.
Where in the harness is this depth configured? IIRC, size-bounding is the best practice for generating recursive structures. I.e., you should generate a random structure up to a particular size, and that size can be bounded by the user or generated itself. I.e., "let list_len = rand(); return list with list_len random elements". Does the "size_hint
" act as the length bound here? Apologies for not digging a lot into the code to figure it out myself ...
I saw that PR #108 that includes
Arbitrary
generator support, but I didn't see corresponding instructions for how to leverage it.In particular, I'd like to use values generated by an
Arbitrary::<Type>
rather than aTypeGenerator::<Type>
. I'm not seeing docs on that. Issue #31 and the comments in PR #108 suggest usingwith_arbitrary...
rather thanwith_type...
but I don't see that function in the docs. I wondered if perhaps you needed to#derive
aTypeGenerator
and could point to anArbitrary
as the source, but I don't see that in the docs either.