Shrinkers could remember the best possible failing shrink for each distinct classification label set

It's a bit of mouthful, so first let me give the original context and then let me explain a bit.

Origin: https://github.com/ianmackenzie/elm-random-test/blob/b565db4926962e374f098d9d8b317d531f2b9e72/README.md

Screenshot 2023-06-12 at 11 31 50 AM

So, this has to do with generator finding input showing bug A, and shrinker collapsing it to an input showing a different bug B.

Example:

floatsAreInOrder =
    Test.fuzz2
        (Random.float 0 10)
        (Random.float 0 10)
        "Silly test"
        (\x y -> x |> Expect.lessThan y)

The generator would first find (1,0) and the shrinker would collapse it down to (0,0). These are possibly two different situations we could care about (GT, then EQ) / two different bugs!

It's hard to find out when this is happening (I feel like the best solution for this would be to compare code coverage "ticks" for each input), but we could approximate this using classification labels (if the user provides them). The test runner would remember and report the best shrink for each label set it has encountered. Label sets are generated from the input value using the user-generated a -> Bool functions, as in:

        reportDistribution
            [ ( "fizz", \n -> (n |> modBy 3) == 0 )
            , ( "buzz", \n -> (n |> modBy 5) == 0 )
            , ( "even", \n -> (n |> modBy 2) == 0 )
            , ( "odd", \n -> (n |> modBy 2) == 1 )
            ]

(There are 2^4 possible subsets of these four labels, so up to 2^4 different bugs to report if our generator+shrinker is lucky enough!)

elm-explorations / test

Shrinkers could remember the best possible failing shrink for each distinct classification label set #218