Closed piccolbo closed 6 years ago
I learned about reproduce_failure. Investigating.
Can reproduce in pdb now. Test fail exactly as I expected, that is x==y in the simplified version above. The questions remain: why now and not before? Why did I need to use reproduce failure with the example database available?
Solved with assume. The number of invalid examples remained high though. Then I changed something unrelated elsewhere and it went back down to almost 0.
attrs
for each test case and then doing a fairly cheap call to check validation.Unrelated tip: it looks like you're decorating all your tests with the same settings - check out the profiles mechanism, or just set the attribute on the global settings.default
object!
With specific notes for the action items, I'm going to close this issue - but remain happy to hear and help with any updates!
Fantastic input @Zac-HD, thanks ! The one thing I am still not understanding is why I could get the failure with a make test, then in ipython, same virtual env, I did a %load of the test file, run the failing test and it did succeed, seconds later. Shouldn't the bug witness be in the database at that point? What are the conditions that require using reproduce_failure on the same machine, in the same virtual env, with the example database available?
Two comments:
It should indeed happen automatically, but there are too many rare-but-possible subtle problems to offer a confident diagnosis. My personal guess would be that there was some non-determinism introduced by e.g. hash randomization causing non-reproducible iteration order somewhere, but that's just a guess because it's bitten me before.
I am a bit puzzled by this irreproducible situation. This test passes on travis https://travis-ci.org/piccolbo/autosig and passed locally yesterday, but now doesn't anymore. It doesn't with the normal
make test
, but it does if I run the test interactively to try and debug it, as if the witness weren't in the example database. I rolled back to the commit that's on travis, to no avail. Now the test is funny, and I think I can fix it, but I feel like it would be a missed learning opportunity.The test is somewhat complicated but it looks a little like this
Since the range generated by the strategy is huge and I think like a statistician, this should pass on a meager 100 runs with probability 1. But as @DRMacIver himself explained to me in another issue, that's the wrong way to think about strategies and an assumption of independence or uniformity of any sort on the strategies is going to lead to pain. Indeed it looks like the witness has x==y, even if what is printed is not the full value. I fully accept that and I think a well placed
assume
in there will solve the problem. Nonetheless, the test was passing yesterday on the same commit. I am running the test in a virtual env, same as travis runs. Besides the failures, I noticed today an exorbitant number of invalid examples, only on the failing test.The strategies are the same between the three tests. No assume or unique that I used explicitly. Complete file is here: https://github.com/piccolbo/autosig/blob/master/tests/test_.py. Any suggestion to get to the bottom of this would be appreciated. Hypothesis 3.69.12 (I downgraded to 3.30.0 to give it a try, but same results).