`test_minimize_adam_matyas_with_noise` intermittently failing

matt-graham commented 1 year ago

Raising issue as part of JOSS review openjournals/joss-reviews/issues/5428

When running the tests with platform linux -- Python 3.11.0, pytest-7.3.1, pluggy-1.0.0 the test in pyvbmc/testing/vbmc/test_minimize_adam.py::test_minimize_adam_matyas_with_noise intermittently fails with an AssertionError, for example two failing cases were

>       assert np.all(np.abs(x) < 1.0)
E       AssertionError: assert False
E        +  where False = <function all at 0x7fd42c38c860>(array([1.43098764, 1.41719139]) < 1.0)
E        +    where <function all at 0x7fd42c38c860> = np.all
E        +    and   array([1.43098764, 1.41719139]) = <ufunc 'absolute'>(array([1.43098764, 1.41719139]))
E        +      where <ufunc 'absolute'> = np.abs

pyvbmc/testing/vbmc/test_minimize_adam.py:70: AssertionError

>       assert np.all(np.abs(x) < 1.0)
E       AssertionError: assert False
E        +  where False = <function all at 0x7f3cda609bc0>(array([1.08179515, 1.15884769]) < 1.0)
E        +    where <function all at 0x7f3cda609bc0> = np.all
E        +    and   array([1.08179515, 1.15884769]) = <ufunc 'absolute'>(array([-1.08179515, -1.15884769]))
E        +      where <ufunc 'absolute'> = np.abs

pyvbmc/testing/vbmc/test_minimize_adam.py:70: AssertionError

but on other runs the test passes. From the difference in the reported value of the x array between the runs in which fails occurred it looks like the test is non-deterministic, and looking at the code it seems that random variates are being generated from the module-level Numpy random number generator rather than a separately seeded Generator instance:

https://github.com/acerbilab/pyvbmc/blob/b7c6e381be75dd508ac9663825fd77ed8885816d/pyvbmc/testing/vbmc/test_minimize_adam.py#L54-L71

It looks like the threshold being tested for here might be a bit tight as the probability of the test failing seems to be non-negligible. More generally I would recommend using seeded random number generators in all tests to avoid intermittent failures like this and make it easier to reproduce and debug failing tests. You could for example create a shared rng fixture in a conftest.py file and parametrize it with one or more seeds passed in as a command line argument

import pytest
import numpy

def pytest_addoption(parser):
    parser.addoption("--seed", type=int, nargs="*", default=[28375632879561])

def pytest_generate_tests(metafunc):
    if "seed" in metafunc.fixturenames:
        metafunc.parametrize("seed", metafunc.config.getoption("seed"))

@pytest.fixture(scope="function")
def rng(seed):
    return numpy.random.default_rng(seed)

The above test would then be changed to something like

 def test_minimize_adam_matyas_with_noise(rng): 
     def f(x_): 
         val = 0.26 * (x_[0] ** 2 + x_[1] ** 2) - 0.48 * x_[0] * x_[1] 
         grad_1 = 0.52 * x_[0] - 0.48 * x_[1] 
         grad_2 = 0.52 * x_[1] - 0.48 * x_[0] 
         grad = np.array([grad_1, grad_2]) + rng.normal( 
             scale=3, size=(2,) 
         ) 
         return val, grad 
     ...

This would mean the default behaviour is to run with a single fixed seed but allow the option of running all tests using the rng fixture across multiple seeds by passing in values to the --seed option to the pytest command to check for robustness to different random draws.

Bobby-Huggins commented 1 year ago

Hi @matt-graham, thanks for opening the issue!

A lot of our tests are stochastic, but it seems this one has a narrow tolerance in particular. For this reason, we have pytest-rerunfailures in the dependencies and run e.g. our automated Workflow tests with pytest --reruns=5 to re-try any failed test up to 5 times (3 should be plenty already, 5 is extra-safe). We mention this in the developer instructions page of the docs, but since you bring it up we should probably add a note about it directed to end-users as well. Maybe even have pytest default to this behavior.

I do think we should look into the solution you propose, if for no other reason than to conform to best practices. I'm not certain at the moment how quickly we could migrate all the tests that have a stochastic component: There are some internal (non-test) changes we'd need to make too in order to have the exact behavior fixed by the pytest seed. In the meantime would you mind trying with (say) --reruns=3, just to confirm that it works as expected?

matt-graham commented 1 year ago

Hi @matt-graham, thanks for opening the issue!

A lot of our tests are stochastic, but it seems this one has a narrow tolerance in particular. For this reason, we have pytest-rerunfailures in the dependencies and run e.g. our automated Workflow tests with pytest --reruns=5 to re-try any failed test up to 5 times (3 should be plenty already, 5 is extra-safe). We mention this in the developer instructions page of the docs, but since you bring it up we should probably add a note about it directed to end-users as well. Maybe even have pytest default to this behavior.

I do think we should look into the solution you propose, if for no other reason than to conform to best practices. I'm not certain at the moment how quickly we could migrate all the tests that have a stochastic component: There are some internal (non-test) changes we'd need to make too in order to have the exact behavior fixed by the pytest seed. In the meantime would you mind trying with (say) --reruns=3, just to confirm that it works as expected?

Thanks for the explanation @Bobby-Huggins, I had indeed missed the note about this in the developer documentation.

All tests do pass when rerunning with --reruns=3. If you are able to make the test behaviour deterministic though I would say it would be worth doing so as otherwise it can be difficult to be sure if an intermittently failing test is due to non-determinism from the use of an unseeded random number generator or some other cause.

matt-graham commented 1 year ago

Closing this as specific issue here is resolved by using --reruns option as already documented, but have created a new issue #138 to track suggestion of generally avoiding using of top-level numpy.random functions in favour of explicitly passing a random number generator instance in to functions.

acerbilab / pyvbmc

`test_minimize_adam_matyas_with_noise` intermittently failing #135