czan / stateful-check

Stateful generative testing in clojure
MIT License
117 stars 11 forks source link

Print out failing seed? #8

Closed bhb closed 5 years ago

bhb commented 6 years ago

Thanks a lot for your work on this library, it's very useful.

One thing I noticed is that when a test fails, the seed doesn't appear to be printed, so if I want to reproduce the test run (after, say, applying a fix for the bug), my only choice is to run a large number of tests and hope the same sequence is generated.

Is there a way to do this?

czan commented 6 years ago

At the moment, no, there isn't. It wouldn't be hard to add, and I have even had it do so in the past, but I am hesitant about adding it back in for two reasons:

  1. For testing parallel test cases, the tests themselves are non-deterministic, so running with the same seed won't necessarily yield the same outcome anyway. I think it would be bad to give a false assurance that "the same test" was run.

  2. When you find a failing test case, you should codify it as a normal unit test, rather than continuing to run it as a generative test. That way you have a regression test for that failure, and you don't waste CPU time generating and testing all of the intermediate cases on the way to the failure. This also safeguards against changes to your model changing that specific test case. It should be fairly mechanical to write the unit test, because the test output should provide all the information you need.

I'm still open to printing the seed, though. Do you think it would be worthwhile, given the above?

bhb commented 6 years ago

@czan Those are all excellent points. Now I'm not so sure about the value of printing the seed. Although I largely agree with what you've said above, I do have a few thoughts.

Overall, I agree using a seed has limitations, but it seems somewhat confusing to me that specification-correct? takes this param at all. I'm not sure what the use case is for this param. Can you speak more about this?

When you find a failing test case, you should codify it as a normal unit test, rather than continuing to run it as a generative test.

In my experience, this depends on the context in which the bug is found. Certainly I agree with you if failures are rare e.g. if I have run the test for some time to find the failure.

However, in the early stages of writing a generative test, failures happen quite rapidly in my experience. In this case, it may be handy to just quickly be able to reproduce a failure while I'm working on a fix. I am not worried about keeping this test around forever because I expect my model will quickly find any regressions.

If the model will quickly run into a bug, why use a seed at all? A few reasons:

  1. The specifics of the repro may change slightly between runs, and it's time consuming to ensure I'm looking at the same bug
  2. In the early stages of building out a test, I may actually have multiple bugs, so when I fix bug A, my next run may find bug B, so I'm not sure if bug A has really been fixed or not.

Re: parallel tests, what do you think about documenting the limitations of using a seed with parallel tests? I may have missed it, but I didn't see any notes in the docstring or the README. Or, perhaps even a warning if someone uses the seed with more than one thread? I understand a warning is a bit invasive, so I'm not sure about this, but it's something to consider.

bhb commented 6 years ago

One other thought: although it's usually easy to come up with an example based test if the bug is with the code under test, it's less simple to verify a bug in the model itself. In this case, using a seed can give rapid feedback on a model bug.

Anyway, something to ponder. I understand this is a tradeoff since there are certainly cases where printing the seed may lead devs down the wrong path.

czan commented 6 years ago

Overall, I agree using a seed has limitations, but it seems somewhat confusing to me that specification-correct? takes this param at all. I'm not sure what the use case is for this param. Can you speak more about this?

Honestly, it's only there because quick-check takes a seed as an argument, and I never thought to remove it. I've been thinking about this over the past few days, and I have some thoughts. Unfortunately, when I started typing them out I found that I couldn't even convince myself that they were the best solution.

I'm going to think about this a bit more, to see if I can think of a way to solve this that also nicely solves the use-cases you mentioned. I think there is potential for mistakes if I provide the seed, but I don't want to have to engineer a more complex solution for your use-cases just to avoid printing the seed. I'd much rather document it as a pitfall, and assume that developers are competent.

My two ideas were:

  1. To have a function to "re-run the last test case".
  2. To have an option to generate the test trace as Clojure code, to easily turn it into an example-based test. This would only work if :command keys are specified as vars, or something. I haven't worked through all the details.

It's probably better to just print the seed, though.

bhb commented 6 years ago

I'm going to think about this a bit more

I think this is prudent. I'm not entirely convinced myself, given the good points you raised above. There's no rush from my perspective: although having a way to reproduce failures (for sequential tests) would be handy, it's not blocking me.

czan commented 6 years ago

After thinking about this more, I've decided to just print the seed. My reasoning is:

  1. It's the simplest way to re-run a known test case.
  2. It's well understood, and matches how people already using test.check will expect things to work.
  3. Any other (reasonable) option for re-running a known failure easily wouldn't survive a restart of the JVM, which could end up being a significant limitation.

If you run a parallel test case it will print the seed, as well as a short note explaining that using the seed doesn't mean it will be deterministic. I have another thing I want to add before I do a release, but I've just pushed 0.4.1-SNAPSHOT if you want to use this immediately.

bhb commented 6 years ago

This sounds like a good solution. Thanks for taking the time to change this!

I don't have an immediate need for this (the project I'm using stateful-check on is paused for the time being, but I'll get back to it soonish), so the next release is fine.

czan commented 5 years ago

I finally got around to cutting a release with this in it today. Sorry that it's taken over nine months! 0.4.1 should be on Clojars now.