Construct and use a 'fuzzing dictionary'

In fuzzing, a "dictionary" is a corpus of known-interesting fragments (boundary values, html tags, etc.) that can be mixed in with randomly-generated or mutated data to increase the chance of stumbling across interesting bugs.

We kinda support doing this with Hypothesis for some types already; it's how we boost the chances of boundary integers and "interesting" floats. However there's not currently any mechanism for adding to the pool at runtime, and adding one will take some care to ensure that we can still replay failing examples without that runtime pool. See also https://github.com/HypothesisWorks/hypothesis/issues/3086 and https://github.com/HypothesisWorks/hypothesis/issues/3127#issuecomment-983314619.

Once we've got that, the standard easy way to get a dictionary is to run strings on your binary. The natural equivalent is to grab our Python source code and collect all the ast.Constant values! (excluding perhaps long strings, which are likely docstrings)

A more advanced trick, shading into full research project, would be to investigate Redqueen-style tracking. For example, "a string in the input matched against this regex pattern in the code, so try generating strings matching that pattern".

Zac-HD / hypofuzz

Construct and use a 'fuzzing dictionary' #8