MozillaSecurity / lithium

Line-based testcase reducer
Mozilla Public License 2.0
95 stars 25 forks source link

Lithium should not try reductions which have already been tested. #52

Open jschwartzentruber opened 7 years ago

jschwartzentruber commented 7 years ago

If we have a testcase containing abc, and bc is the reduced result (--char mode, strategy=minimize), the run will look like this:

bc ➡️ interesting
c ➡️ not interesting
b ➡️ not interesting
Starting another round with chunk size 1
c ➡️ not interesting
b ➡️ not interesting

This final round of chunk size 1 will always be unneeded as long as the reduction in the previous round was only at the beginning. I think this will be true for all atom types (line/char/symbol) and strategies.

nth10sd commented 7 years ago

For the following, I assume abcde is the testcase and bde is the reduced result.

A long time ago, it was supposed to work as the following:

bcde ➡️ interesting
cde ➡️ not interesting
bde ➡️ interesting
be ➡️ not interesting
bd ➡️ not interesting
Starting another round with chunk size 1
de ➡️ not interesting
(Lithium detects that the rest of the combinations were already tested, and reduction ends)

Some years ago, the optimization was dropped:

bcde ➡️ interesting
cde ➡️ not interesting
bde ➡️ interesting
be ➡️ not interesting
bd ➡️ not interesting
Starting another round with chunk size 1
de ➡️ not interesting
be ➡️ not interesting (duplicated)
bd ➡️ not interesting (duplicated)

Thus, it is not a matter of a duplicated rounds, it is that we lost the ability to detect duplicated tries of certain particular reduction instances, and hence not needing to test them.

jschwartzentruber commented 6 years ago

Yep, you're right. The safest way is to introduce (or re-introduce?) a cache of what has been tried already. This would solve other redundant work issues like #57 too, although that's been fixed another way already.

choller commented 6 years ago

I can confirm that adding a cache is what is typically done for these types of algorithms. The DD algorithm has similar duplication and is therefore in practice always implemented with a minchunk (e.g. line) based cache.