Open dvyukov opened 9 years ago
sweet :)
http://resources.sei.cmu.edu/library/asset-view.cfm?assetid=28043
Not sure if that algorithm is applicable since we don't always have a pristine seed file.
It appears the standard algorithm that applies here is https://www.st.cs.uni-saarland.de/papers/tse2002/tse2002.pdf
My favourite part: Simplification is not equivalent to isolation. Isolation shows the smallest difference between a failing and a passing case. Simplification shows the smallest failing case.
I implemented the ddmin minimization algorithm: https://github.com/dgryski/go-ddmin ; would be interesting to compare the results of this to what go-fuzz's techniques reduce it to.
@dgryski I am pretty sure we can improve the current minimization algorithm, so feel free to send a pull request if/when you do the comparison. What is time complexity of the algorithm? Current O(N^2) is somewhat unpleasant. Current tail stripping is quite efficient for real inputs, but if ddmin has good time complexity, then probably we can just do the generic thing. Also, there can be some useful minimization techniques that mutate the data itself (like current replacement with 0). For example, there can be an identifier in the input, but you need to alter (shorten) all occurrences of it in the input. Or, there can be a length-value pair, if you shorten the value, you also need to change length accordingly.
ddmin is also O(n^2) time complexity. It also doesn't have any heuristics; it's just an systematic brute force approach to removing pieces of input. I would use it instead of the current first two steps.
I'm not sure when I'll have time to look at this though.
Another ddmin implementation, seemingly written around the same time: https://godoc.org/github.com/dominikh/go-quickcheck
The ddmin algorithm there is mine, just generalized away from []byte.
Current minimization code is https://github.com/dvyukov/go-fuzz/blob/34dea346b8c14573ea065f2a3654a7d85e1a8b0c/go-fuzz/worker.go#L322
Currently go-fuzz has 3 minimization strategies:
Here was suggested another strategy: https://github.com/gogo/protobuf/issues/91#issuecomment-140676334 Namely:
Also what I hit several times when inputs are programs: