Closed jwoudenberg closed 6 years ago
I think we generate one seed per test, sequentially, so removing/adding a test before the one you're interested in will change your seed. I'd suggest we use something like hash(masterSeed++testname)
as the seed for each test, or simply reuse the masterSeed for all tests.
Meanwhile, can you checkout the old commit in git and see if you can reproduce it at all, and maybe share a smaller version of the flaky test for us to learn from?
Sure! What should the example illustrate? A combination of a fuzzy test and a seed, that fails when the fuzzy test is ran alone, vs when it is run as part of a suite containing another test?
Or are you interested in why this particular fuzzy tested flaked in the first place? Happy to show that, but it's more of a logic error on my part than a problem with elm-test :).
I'd like to see the flaky test, by itself :) Having a list of bad tests that users managed to write helps figuring out how to write fuzzers so that we can catch more of these bugs in the future.
I think we generate one seed per test, sequentially, so removing/adding a test before the one you're interested in will change your seed. I'd suggest we use something like hash(masterSeed++testname) as the seed for each test, or simply reuse the masterSeed for all tests.
I'm so into this idea that I already did it back in July. 😸
Adding new sibling tests shouldn't affect the seeds. However, changing test description strings, or their ancestor describe
labels, can cause the seeds to change...but I don't think it's possible for any implementation to avoid that.
Is it possible that changing test descriptions was the cause here?
I'd like to see the flaky test, by itself :) Having a list of bad tests that users managed to write helps figuring out how to write fuzzers so that we can catch more of these bugs in the future.
The test was fine, it found a bug in the code!
Is it possible that changing test descriptions was the cause here?
I'll investigate this tomorrow.
Must have been a test description failure. I tried to reproduce the error in the small and couldn't. Thanks for all the replies and sorry for the noise!
What I ment was, if you tried to write a fuzz test, and it incorrectly passed most of the time, maybe we can make the fuzzers smarter so that they find the input that would make your test fail?
We had the following scenario play out in our codebase:
My explanation for the test now passing is that in those past weeks our test suite has grown to include additional tests. We still run our test suite using the same master seed, but because distribution of that seed has changed on account of the extra tests, our individual failing test now receives a different seed.
My takeaway is that the 'reproduce' line elm-test prints when a test fails is currently useful only on the short term: while working on adding some tests you see some failing case pass by and can use the reproduce information to handle that situation right then and there. I think it would be great if the 'reproduce' line remains valid as long as the test doesn't change.
I think this would be my ideal situation: Each failing test gets a separate reproduce line that reruns just that one test with the same seed. That line should remain valid for as long as the test it is meant to reproduce doesn't change.