Open janosg opened 7 months ago
Yes, the tests for MC methods are flakeyy. We have at best (eps,delta) bounds on sample complexity assuming deterministic utilities. So tests will fail from time to time.
We have a fixture allowing for a fraction of tests to fail, but we are not really using it. One reason is that the fixture is "Ensure that results are within eps precision, with 1-delta probability", so you need to run a bunch times which is obviously costly. Another is that even that is a probabilistic statement, so we'd only reduce the number of failures, but never eliminate them.
I guess a more pragmatic approach would be to allow one retry, ignoring the sample complexity bounds altogether. This would catch 99% of the failures and be good enough for regression tests.
This is really important if you ask me. This test is not the only one that is failing randomly, there are a few others as well. We need to fix this, it is really annoying having to rerun failing tests. I agree with @mdbenito to add a retry fixture to all test that involve some kind of randomness.
Note that we have a tolerate
fixture already that can be used for this right away. (Related: https://github.com/aai-institute/pyDVL/issues/539)
The test case
test_shapley_batch_size[1-PermutationSampler-beta_coefficient_w-5-test_game0]
sometimes fails due to a precision problem. The test output is:The failure seems to occur randomly (on the same machine) with a relatively low probability.