Reproducing a failing fuzzy test after new tests have been added

jwoudenberg commented 6 years ago

We had the following scenario play out in our codebase:

A fuzzy test failed once during CI. It was a pretty rare failure (we hadn't seen it before), so we didn't prioritize its investigation immediately but created an issue that included elm-tests instructions for reproducing the failing test.
Some weeks later we set out to fix the bug. We start with trying to reproduce the failure using the instructions we previously saved, but using those instructions the test now passes.

My explanation for the test now passing is that in those past weeks our test suite has grown to include additional tests. We still run our test suite using the same master seed, but because distribution of that seed has changed on account of the extra tests, our individual failing test now receives a different seed.

My takeaway is that the 'reproduce' line elm-test prints when a test fails is currently useful only on the short term: while working on adding some tests you see some failing case pass by and can use the reproduce information to handle that situation right then and there. I think it would be great if the 'reproduce' line remains valid as long as the test doesn't change.

I think this would be my ideal situation: Each failing test gets a separate reproduce line that reruns just that one test with the same seed. That line should remain valid for as long as the test it is meant to reproduce doesn't change.

drathier commented 6 years ago

I think we generate one seed per test, sequentially, so removing/adding a test before the one you're interested in will change your seed. I'd suggest we use something like hash(masterSeed++testname) as the seed for each test, or simply reuse the masterSeed for all tests.

Meanwhile, can you checkout the old commit in git and see if you can reproduce it at all, and maybe share a smaller version of the flaky test for us to learn from?

jwoudenberg commented 6 years ago

Sure! What should the example illustrate? A combination of a fuzzy test and a seed, that fails when the fuzzy test is ran alone, vs when it is run as part of a suite containing another test?

Or are you interested in why this particular fuzzy tested flaked in the first place? Happy to show that, but it's more of a logic error on my part than a problem with elm-test :).

drathier commented 6 years ago

I'd like to see the flaky test, by itself :) Having a list of bad tests that users managed to write helps figuring out how to write fuzzers so that we can catch more of these bugs in the future.

rtfeldman commented 6 years ago

I think we generate one seed per test, sequentially, so removing/adding a test before the one you're interested in will change your seed. I'd suggest we use something like hash(masterSeed++testname) as the seed for each test, or simply reuse the masterSeed for all tests.

I'm so into this idea that I already did it back in July. 😸

Adding new sibling tests shouldn't affect the seeds. However, changing test description strings, or their ancestor describe labels, can cause the seeds to change...but I don't think it's possible for any implementation to avoid that.

Is it possible that changing test descriptions was the cause here?

jwoudenberg commented 6 years ago

I'd like to see the flaky test, by itself :) Having a list of bad tests that users managed to write helps figuring out how to write fuzzers so that we can catch more of these bugs in the future.

The test was fine, it found a bug in the code!

Is it possible that changing test descriptions was the cause here?

I'll investigate this tomorrow.

jwoudenberg commented 6 years ago

Must have been a test description failure. I tried to reproduce the error in the small and couldn't. Thanks for all the replies and sorry for the noise!

drathier commented 6 years ago

What I ment was, if you tried to write a fuzz test, and it incorrectly passed most of the time, maybe we can make the fuzzers smarter so that they find the input that would make your test fail?

elm-community / elm-test

Reproducing a failing fuzzy test after new tests have been added #238