Open roddux opened 3 years ago
Looking at this again, I think this also addresses #174.
Overall I am a fan of scripting expert smartness and making it available to all users out-of-the box, rather then shifting the hard work onto every user. We could do better static analysis as you noted, intercept byte/string comparisons at runtime to build dynamic dictionary, etc. But as Josh noted, simplicity of this change bribes, so I guess I don't mind.
I'm torn on the format; I like how it's simple, but it's not hard to imagine newline characters being useful in literals. One sloppy option is to stay line-oriented, but apply strconv.Unquote if possible and if not, accept as-is. Then you can use a quoted string to get any literal you want in (including a literal that looks like a quoted string), while still having a simple form for everything else. What do you think?
Good point. Strictly speaking, the input format may be binary and one may want to include some magic binary sequences. Opportunistically trying strconv.Unquote may lead to some surprises for e.g.:
aaa
bbb
"foo"
where I literally want foo with quotes, but they will be silently stripped with no feedback...
I can think of using strconv.Unquote always (somewhat cumbersome for users), or supporting either current format, or json-encoded []string
for better control. Is there any prior art in other fuzzers (AFL, LibFuzzer, hongfuzz)?
Thank you both for the feedback! I haven't forgotten about this PR - I'll find the time to work on this soon (hopefully within the next couple weeks).
Take all the time you need. :)
In recent testing I've found that the
ROData.strLits
list of literals can fill with useless noise; strings collected from places such as error messages, e.g.:This list of literals is used directly by
go-fuzz
in the mutation logic, i.e.: https://github.com/dvyukov/go-fuzz/blob/6a8e9d1f2415cf672ddbe864c2d4092287b33a21/go-fuzz/mutator.go#L346-L367Having lots of noise in
strLits
can therefore result in some fairly useless test cases, particularly for syntax-aware programs.I propose this small change to add a
-dict
option, so that the user can manually supply a list of useful tokens togo-fuzz
. This replaces theROData.strLits
tokens (built from the list in themetadata
file) with a high-signal list that the user supplies.Other thoughts
The signal of the built-in token list could perhaps be improved by modifying the code to avoid messages passed to functions such as log.Fatal or fmt.Print, etc. https://github.com/dvyukov/go-fuzz/blob/6a8e9d1f2415cf672ddbe864c2d4092287b33a21/go-fuzz-build/cover.go#L394