dvyukov / go-fuzz

Randomized testing for Go
Apache License 2.0
4.75k stars 276 forks source link

Better minimization #100

Open dvyukov opened 9 years ago

dvyukov commented 9 years ago

Currently go-fuzz has 3 minimization strategies:

Here was suggested another strategy: https://github.com/gogo/protobuf/issues/91#issuecomment-140676334 Namely:

Also what I hit several times when inputs are programs:

awalterschulze commented 9 years ago

sweet :)

dgryski commented 9 years ago

http://resources.sei.cmu.edu/library/asset-view.cfm?assetid=28043

Not sure if that algorithm is applicable since we don't always have a pristine seed file.

dgryski commented 8 years ago

It appears the standard algorithm that applies here is https://www.st.cs.uni-saarland.de/papers/tse2002/tse2002.pdf

awalterschulze commented 8 years ago

My favourite part: Simplification is not equivalent to isolation. Isolation shows the smallest difference between a failing and a passing case. Simplification shows the smallest failing case.

dgryski commented 8 years ago

I implemented the ddmin minimization algorithm: https://github.com/dgryski/go-ddmin ; would be interesting to compare the results of this to what go-fuzz's techniques reduce it to.

dvyukov commented 8 years ago

@dgryski I am pretty sure we can improve the current minimization algorithm, so feel free to send a pull request if/when you do the comparison. What is time complexity of the algorithm? Current O(N^2) is somewhat unpleasant. Current tail stripping is quite efficient for real inputs, but if ddmin has good time complexity, then probably we can just do the generic thing. Also, there can be some useful minimization techniques that mutate the data itself (like current replacement with 0). For example, there can be an identifier in the input, but you need to alter (shorten) all occurrences of it in the input. Or, there can be a length-value pair, if you shorten the value, you also need to change length accordingly.

dgryski commented 8 years ago

ddmin is also O(n^2) time complexity. It also doesn't have any heuristics; it's just an systematic brute force approach to removing pieces of input. I would use it instead of the current first two steps.

I'm not sure when I'll have time to look at this though.

tv42 commented 8 years ago

Another ddmin implementation, seemingly written around the same time: https://godoc.org/github.com/dominikh/go-quickcheck

dgryski commented 8 years ago

The ddmin algorithm there is mine, just generalized away from []byte.

dgryski commented 6 years ago

Current minimization code is https://github.com/dvyukov/go-fuzz/blob/34dea346b8c14573ea065f2a3654a7d85e1a8b0c/go-fuzz/worker.go#L322