meta: forge fuzzer improvements

mds1 commented 2 years ago

The current state of fuzzing is naive (we think, more on this below)—it uses proptest strategies to generate random, uniformly distributed inputs based on Solidity types. This is an ok start, but isn't great. There's lots of improvements to the fuzzer that can be made to increase the chances of finding bugs.

This issue is spec out (1) what fuzzing features and functionality we want to have, and then (2) how to approach that, e.g. are custom proptest strategies sufficient, should we switch to a different crate, etc.

I'm certainly no fuzzing expert, but as a jumping off point here's my assessment of functionality we'd want to have, as well as a list of other rust crates for fuzzing to consider

Fuzzing Features

These are just ones that come to mind based on my limited knowledge of fuzzing and my use of foundry so far

Bias random values towards extremes

Based on the proptest docs and this open proptest issue, it appears proptest does not bias towards extreme values and edge cases such as 0, max, min, etc.

"Synchronize" the fuzzed inputs

Some bugs require multiple inputs to be in sync, e.g. all zeros or empty arrays. If each input is generated independently and uniformly, this is unlikely to happen. I think this is the current behavior of proptest, but I may be wrong because the fuzzer did find a failure case with a counter example of [], [], [], 0x, 0x for inputs of types (address,uint256[],uint256[],uint256[],bytes,bytes). I believe this specific example came up on each fuzz run, suggesting proptest may be smarter than myself and the docs give it credit for

Some degree of control over array and bytes lengths

Currently the max size of inputs for arrays and bytes are bounded, but some bugs may only surface with larger inputs. Always allowing very large inputs might slow down tests, so it'd be useful to either allow them be unbound (or have larger bounds) via a flag (e.g. for use in CI), or to bias the fuzzer to have large inputs only be used occasionally

Constant mining

Details and motivation can be found here. The summary is that echidna uses constant mining to extract return values from other methods and uses those as fuzzer inputs. As shown in the link, this helps find bugs that would not be found otherwise

`assume` functionality

Allow inputs that don't meet certain conditions, such as ignoring the zero address or ensuring that x + y < z, to be discarded and have a new set of inputs generated. Discarded fuzz runs are not counted toward the number of executed runs, so new inputs would be generated and the test re-ran

Bounding inputs to a range

It's debatable whether this should be part of the fuzzer, or if users should just manage it manually by bounding the fuzzer's inputs, but a way to control the range of inputs is often useful to avoid reverts due to overflows, etc.

Symbolically execute to seed fuzzer inputs

See https://github.com/gakonst/foundry/issues/387#issuecomment-1006626020 below:

Consider using SMT solvers like z3 directly to generate random valid inputs, instead of hoping some generic fuzzer will support smart contract specific needs.

h/t @yaronvel

Rust Fuzzing Crates

There's probably others, but here's a few I've come across so far

proptest is what we currently use
quickcheck — QuickCheck for Haskell is what dapptools uses
afl.rs allows fuzzing with AFL
cargo-fuzz allows fuzzing with libFuzzer
arbitrary is intended to be combined with a fuzzer like libFuzzer and cargo-fuzz or AFL, and to help you turn the raw, untyped byte buffers that they produce into well-typed, valid, structured values (h/t @mattsse)

mattsse commented 2 years ago

there's also https://github.com/rust-fuzz/arbitrary

gakonst commented 2 years ago

We could also explore making our own Strategy implementations for proptest.

yaronvel commented 2 years ago

Consider using SMT solvers like z3 directly to generate random valid inputs, instead of hoping some generic fuzzer will support smart contract specific needs. Can elaborate more on this if you want, in my previous life I once built a randomized testing tool with z3 from scratch.

wuestholz commented 2 years ago

@mds1 @gakonst Great to hear that there are other efforts to get more users hooked on fuzzing! :) You might want to take a look at our Harvey fuzzer; see https://mariachris.github.io/Pubs/FSE-2020-Harvey.pdf for ideas on what features make a big difference in practice. Harvey is used both in Diligence's MythX service and our dedicated Diligence fuzzing solution.

mds1 commented 2 years ago

Consider using SMT solvers like z3 directly to generate random valid inputs, instead of hoping some generic fuzzer will support smart contract specific needs. Can elaborate more on this if you want, in my previous life I once built a randomized testing tool with z3 from scratch.

Hey @yaronvel—I'm not too familiar with how you'd implement this in practice, and foundry doesn't yet have any symbolic execution capabilities, so this probably won't be the first path we go down. However I've always thought seeding the fuzzer with solver outputs would be valuable, so would love if you could elaborate more!

mds1 commented 2 years ago

Great to hear that there are other efforts to get more users hooked on fuzzing! :) You might want to take a look at our Harvey fuzzer; see https://mariachris.github.io/Pubs/FSE-2020-Harvey.pdf for ideas on what features make a big difference in practice. Harvey is used both in Diligence's MythX service and our dedicated Diligence fuzzing solution.

Thanks @wuestholz this looks great, planning to read this soon. Only question until then is: looks like the paper is from November 2020, so is there anything you've learned since then that isn't covered in the paper, or anything you'd change if writing a similar paper today? It's only about a year old, but a year is a long time in the smart contract world! 😁

wuestholz commented 2 years ago

Great to hear that there are other efforts to get more users hooked on fuzzing! :) You might want to take a look at our Harvey fuzzer; see https://mariachris.github.io/Pubs/FSE-2020-Harvey.pdf for ideas on what features make a big difference in practice. Harvey is used both in Diligence's MythX service and our dedicated Diligence fuzzing solution.

Thanks @wuestholz this looks great, planning to read this soon. Only question until then is: looks like the paper is from November 2020, so is there anything you've learned since then that isn't covered in the paper, or anything you'd change if writing a similar paper today? It's only about a year old, but a year is a long time in the smart contract world! 😁

😁 We're continuously improving Harvey based on feedback from auditors and clients. We're planning to write up one of the novel techniques we added soon (spoiler alert: up to 30% coverage increase for some of our benchmarks). I think all the lessons in the paper still apply today, but we've definitely been optimizing Harvey more for property checking (for instance, in combination with our Scribble specification language). We've also found several cool use-cases for the input prediction technique we presented in our FSE paper.

gakonst commented 2 years ago

How much of the specification/property writing do you think we can overload into Solidity syntax, vs having a separate language?

wuestholz commented 2 years ago

@gakonst I'm hoping a lot! 😁 We're trying to keep Scribble as simple as possible. Ideally, a Solidity developer can understand most specifications "out-of-the-box"; for instance, we allow regular Solidity expressions whenever possible, and only extend the syntax slightly when needed (for instance, old-expressions in postconditions).

From my perspective, the Solidity team can "steal" as much of Scribble as possible. 😉 We partly started Scribble to explore the language design space and are more than happy to evolve the language with feedback from the community.

transmissions11 commented 2 years ago

here's a concrete example of the fuzzer missing multiple bugs that other fuzzers (dapptools) would detect easily:

function testEquality(uint256 x, uint256 y) public {
        uint256 xy;

        unchecked {
            xy = x * y;
        }

        if ((x != 0 && xy / x != y)) return;

        assertEq(((xy - 1) / 1e18) + 1, (xy - 1) / (1e18 + 1));
}

mds1 commented 2 years ago

Primary fuzzing technique we should focus on right now is building the best dictionary, and selecting inputs from both the dictionary and from a randomly generated value, with a user-defined weight specifying the frequency of each.

Biasing towards extremes can be handled by adding values to the dictionary
Constant mining can be handled by dumping the EVM state into the dictionary
Can also add values by parsing solc output or using slither
Instead of coverage guidance, add a flag to output a coverage report, so user knows what wasn't hit
A cheatcode to add values to dictionary can help improve coverage, based on coverage report

To implement this we need a weighted union of two strategies:

Strategy 1: Randomly generated values, which we currently have
Strategy 2: The dictionary, using proptest's select, which creates a strategy that uniformly selects one value from values:
Union: Use new_weighted function to create a strategy that selects from the above two strategies with the specified weights
Echidna's default dictionary frequency is 40%, so we should use that value also. This means random values are used 1.5 times more often than dictionary values, and based on how new_weighted requires wights to be specified, this is equivalent to a weight of 1 for dictionary and 1.5 for random values

mds1 commented 1 year ago

Closing this since the fuzzer has improved since this was created, so this is now stale. Will create a new issue to track future fuzz improvements

foundry-rs / foundry