Closed mds1 closed 1 year ago
there's also https://github.com/rust-fuzz/arbitrary
We could also explore making our own Strategy
implementations for proptest.
Consider using SMT solvers like z3 directly to generate random valid inputs, instead of hoping some generic fuzzer will support smart contract specific needs. Can elaborate more on this if you want, in my previous life I once built a randomized testing tool with z3 from scratch.
@mds1 @gakonst Great to hear that there are other efforts to get more users hooked on fuzzing! :) You might want to take a look at our Harvey fuzzer; see https://mariachris.github.io/Pubs/FSE-2020-Harvey.pdf for ideas on what features make a big difference in practice. Harvey is used both in Diligence's MythX service and our dedicated Diligence fuzzing solution.
Consider using SMT solvers like z3 directly to generate random valid inputs, instead of hoping some generic fuzzer will support smart contract specific needs. Can elaborate more on this if you want, in my previous life I once built a randomized testing tool with z3 from scratch.
Hey @yaronvel—I'm not too familiar with how you'd implement this in practice, and foundry doesn't yet have any symbolic execution capabilities, so this probably won't be the first path we go down. However I've always thought seeding the fuzzer with solver outputs would be valuable, so would love if you could elaborate more!
Great to hear that there are other efforts to get more users hooked on fuzzing! :) You might want to take a look at our Harvey fuzzer; see https://mariachris.github.io/Pubs/FSE-2020-Harvey.pdf for ideas on what features make a big difference in practice. Harvey is used both in Diligence's MythX service and our dedicated Diligence fuzzing solution.
Thanks @wuestholz this looks great, planning to read this soon. Only question until then is: looks like the paper is from November 2020, so is there anything you've learned since then that isn't covered in the paper, or anything you'd change if writing a similar paper today? It's only about a year old, but a year is a long time in the smart contract world! 😁
Great to hear that there are other efforts to get more users hooked on fuzzing! :) You might want to take a look at our Harvey fuzzer; see https://mariachris.github.io/Pubs/FSE-2020-Harvey.pdf for ideas on what features make a big difference in practice. Harvey is used both in Diligence's MythX service and our dedicated Diligence fuzzing solution.
Thanks @wuestholz this looks great, planning to read this soon. Only question until then is: looks like the paper is from November 2020, so is there anything you've learned since then that isn't covered in the paper, or anything you'd change if writing a similar paper today? It's only about a year old, but a year is a long time in the smart contract world! 😁
😁 We're continuously improving Harvey based on feedback from auditors and clients. We're planning to write up one of the novel techniques we added soon (spoiler alert: up to 30% coverage increase for some of our benchmarks). I think all the lessons in the paper still apply today, but we've definitely been optimizing Harvey more for property checking (for instance, in combination with our Scribble specification language). We've also found several cool use-cases for the input prediction technique we presented in our FSE paper.
How much of the specification/property writing do you think we can overload into Solidity syntax, vs having a separate language?
@gakonst I'm hoping a lot! 😁 We're trying to keep Scribble as simple as possible. Ideally, a Solidity developer can understand most specifications "out-of-the-box"; for instance, we allow regular Solidity expressions whenever possible, and only extend the syntax slightly when needed (for instance, old
-expressions in postconditions).
From my perspective, the Solidity team can "steal" as much of Scribble as possible. 😉 We partly started Scribble to explore the language design space and are more than happy to evolve the language with feedback from the community.
here's a concrete example of the fuzzer missing multiple bugs that other fuzzers (dapptools) would detect easily:
function testEquality(uint256 x, uint256 y) public {
uint256 xy;
unchecked {
xy = x * y;
}
if ((x != 0 && xy / x != y)) return;
assertEq(((xy - 1) / 1e18) + 1, (xy - 1) / (1e18 + 1));
}
Primary fuzzing technique we should focus on right now is building the best dictionary, and selecting inputs from both the dictionary and from a randomly generated value, with a user-defined weight specifying the frequency of each.
To implement this we need a weighted union of two strategies:
select
, which creates a strategy that uniformly selects one value from values
:new_weighted
function to create a strategy that selects from the above two strategies with the specified weightsnew_weighted
requires wights to be specified, this is equivalent to a weight of 1 for dictionary and 1.5 for random valuesClosing this since the fuzzer has improved since this was created, so this is now stale. Will create a new issue to track future fuzz improvements
The current state of fuzzing is naive (we think, more on this below)—it uses
proptest
strategies to generate random, uniformly distributed inputs based on Solidity types. This is an ok start, but isn't great. There's lots of improvements to the fuzzer that can be made to increase the chances of finding bugs.This issue is spec out (1) what fuzzing features and functionality we want to have, and then (2) how to approach that, e.g. are custom proptest strategies sufficient, should we switch to a different crate, etc.
I'm certainly no fuzzing expert, but as a jumping off point here's my assessment of functionality we'd want to have, as well as a list of other rust crates for fuzzing to consider
Fuzzing Features
These are just ones that come to mind based on my limited knowledge of fuzzing and my use of foundry so far
Bias random values towards extremes
Based on the proptest docs and this open proptest issue, it appears proptest does not bias towards extreme values and edge cases such as 0, max, min, etc.
"Synchronize" the fuzzed inputs
Some bugs require multiple inputs to be in sync, e.g. all zeros or empty arrays. If each input is generated independently and uniformly, this is unlikely to happen. I think this is the current behavior of proptest, but I may be wrong because the fuzzer did find a failure case with a counter example of
[], [], [], 0x, 0x
for inputs of types(address,uint256[],uint256[],uint256[],bytes,bytes)
. I believe this specific example came up on each fuzz run, suggesting proptest may be smarter than myself and the docs give it credit forSome degree of control over array and bytes lengths
Currently the max size of inputs for arrays and bytes are bounded, but some bugs may only surface with larger inputs. Always allowing very large inputs might slow down tests, so it'd be useful to either allow them be unbound (or have larger bounds) via a flag (e.g. for use in CI), or to bias the fuzzer to have large inputs only be used occasionally
Constant mining
Details and motivation can be found here. The summary is that echidna uses constant mining to extract return values from other methods and uses those as fuzzer inputs. As shown in the link, this helps find bugs that would not be found otherwise
assume
functionalityAllow inputs that don't meet certain conditions, such as ignoring the zero address or ensuring that
x + y < z
, to be discarded and have a new set of inputs generated. Discarded fuzz runs are not counted toward the number of executed runs, so new inputs would be generated and the test re-ranBounding inputs to a range
It's debatable whether this should be part of the fuzzer, or if users should just manage it manually by bounding the fuzzer's inputs, but a way to control the range of inputs is often useful to avoid reverts due to overflows, etc.
Symbolically execute to seed fuzzer inputs
See https://github.com/gakonst/foundry/issues/387#issuecomment-1006626020 below:
h/t @yaronvel
Rust Fuzzing Crates
There's probably others, but here's a few I've come across so far