Open Janiczek opened 7 years ago
Another run option possibility, I think also available in Erlang QuickCheck: run for x seconds
This is interesting!
To set expectations, I don't think we should work on this until after the 4.0.0
release. That release will be breaking and is already hefty in scope, so I'd rather not consider any more large features for it.
As an aside, this seems like it's more scoped to test runners, so I might open an issue on https://github.com/rtfeldman/node-test-runner and/or other runners instead of here.
i think its more of a runner issue, I think being able to say elm test --run-time=10s
and have each fuzzer run for 10s makes more sense than having to change source code if you want to run tests for longer
I say it's mostly a test runner issue, but it probably requires some changes here as well. I'd prefer a timeout instead of infinity. I don't think developers should have to change their source code for this. You might want different settings at different places for the same code, like running tests for 5 minutes before merging into master or before releasing, but only a few seconds locally.
There are some considerations when writing a fuzzer that runs 100 seconds rather than 100 times though. First one is determinism; a timeout-based implementation will not be deterministic. This might not seem like a problem at first, but debugging failed fuzz test runs in a non-deterministic environment is a real pain. If the failed test case outputs the seed used to generate the failing example so that we can run that test with the same input again (as long as you don't change fuzzers), this wouldn't be a problem anymore.
Second one is whether you want a global timeout or a per-test timeout. Global timeout means less and less time spent testing new code, and per-test timeout means ever increasing delay waiting for tests to run, or having to reconfigure Travis every now and then, assuming iteration speed is important to you.
The non-determinism can be solved by printing a "run this command to reproduce this failure", which could be parsed and replaced by each test runner into their syntax. It could just output a json object and require that runners parse and replace it in the output.
If we want to go all in, allowing the test runners to tell elm-test how many iterations or how much time should be spent in each test allows test runners to store a list of known bad inputs for automated regression testing, and it allows the test runner to focus on newer and larger tests. It even allows for distributed testing, so you can run some tests (maybe each test file) on a different machine. Hypothesis for Python has a database of seeds for all inputs that failed a test previously, and (some of) these are run before the fuzzing starts. I imagine it's a pain to implement even the elm-test part, though.
First one is determinism; a timeout-based implementation will not be deterministic. This might not seem like a problem at first, but debugging failed fuzz test runs in a non-deterministic environment is a real pain. If the failed test case outputs the seed used to generate the failing example so that we can run that test with the same input again (as long as you don't change fuzzers), this wouldn't be a problem anymore.
The current Node runner already gives you what you need for this. At the start of each run it prints out something like run elm-test --seed 123456 to reproduce these results
(which are deterministic; all individual test seeds are derived from that initial randomly-generated one, so if you pass it back in, you'll get the same test results), plus if a fuzz test fails, it prints out the exact fuzz inputs that caused the failure, so you could copy/paste those into a unit test to reproduce as well.
As to these design questions, I think the answer to all of them is "revisit once someone has a specific motivating use case."
Until then this is interesting, but not something I think we should work on implementing. 🙂
the other thing is if we want to have huge test runs, we might want to have a better way to sample fuzz values, so that we can test that you are testing more values and not just testing the same values over and over again. (I have seen this with JSCheck)
Haskell QuickCheck shows first few most used fuzz values + their percentage. Might be good for sanity check, yeah.
@zkessin yes, Python Hypothesis spends about 30% of its time testing the same ~30 floats over and over again, and elm-test spends 7% of its time testing the float 0
. This only makes sense if you're not sure that you'll cover all evil inputs in a single test run, such as when using 2 or more float fuzzers; for very long test runs, this becomes very inefficient.
erlang quickchecks have functions that will let you take a fuzzer and will show sample output, as well as a function that will take a fuzzer and a value and shrink it. Not something I use real often, but from time to time they are very useful.
JSCheck on javascript if you ask for an array will basicly top out at length 6 or so
infinite number of runs
I wonder whether this is a good idea. :)
My vision is that an user would do something like:
and the test runner would fuzz and fuzz until killed (
Ctrl+C
or something similar), possibly being left to run overnight.quit on first failure
An idea from the other end of the spectrum is: stop immediately after finding and shrinking a failure. I believe this is what Erlang QuickCheck does? Would this be a good idea?