jrapin commented 3 years ago

965 fixes tests that have started being flaky recently (Dec 2020) without any related change.

This is probably a dependency update having a wide impact on random states, but not sure.

In 17 December 2020, running this on master was flaky: pytest nevergrad/benchmark/test_xpbase.py::test_noisy_artificial_function_loss --count=100 --exitfirst However this was stable before (never had any failure for months/years on this test. Still, using branches dating back to June 2020, I could see the same error. This is why I expect a dependency update has been at play.

From a recent, seemingly stable build in December:

absl-py==0.11.0
alabaster==0.7.12
appdirs==1.4.4
astroid==2.4.2
astunparse==1.6.3
attrs==20.3.0
autodocsumm==0.2.2
Babel==2.9.0
backcall==0.2.0
bayesian-optimization==1.2.0
beautifulsoup4==4.9.3
black==20.8b1
bleach==3.2.1
Brotli==1.0.9
cachetools==4.2.0
certifi==2020.12.5
cffi==1.14.4
chardet==3.0.4
click==7.1.2
cloudpickle==1.6.0
cma==3.0.3
colorama==0.4.4
commonmark==0.9.1
coverage==5.3
cryptography==3.3.1
cycler==0.10.0
decorator==4.4.2
docutils==0.16
fcmaes==1.2.2
Flask==1.1.2
Flask-Compress==1.8.0
future==0.18.2
gast==0.3.3
google-auth==1.23.0
google-auth-oauthlib==0.4.2
google-pasta==0.2.0
grpcio==1.34.0
gym==0.17.3
h5py==2.10.0
hiplot==0.1.21
hyperopt==0.2.5
idna==2.10
imageio==2.9.0
imagesize==1.2.0
importlib-metadata==3.1.1
iniconfig==1.1.1
IOHexperimenter==0.2.8.7
ipython==7.19.0
ipython-genutils==0.2.0
isort==5.6.4
itsdangerous==1.1.0
jedi==0.17.2
jeepney==0.6.0
Jinja2==2.11.2
joblib==0.17.0
Keras-Preprocessing==1.1.2
keyring==21.5.0
kiwisolver==1.3.1
koncept==0.2.2
kuti==0.9.6
lazy-object-proxy==1.4.3
Markdown==3.3.3
MarkupSafe==1.1.1
matplotlib==3.3.3
mccabe==0.6.1
mixsimulator==0.2.9.5
munch==2.5.0
mypy==0.790
mypy-extensions==0.4.3
networkx==2.5
-e git+git@github.com:facebookresearch/nevergrad.git@8946daf63207b24792da8a775ab9c58e42a05445#egg=nevergrad
nose==1.3.7
numpy==1.18.5
numpy-stubs @ git+https://github.com/numpy/numpy-stubs@2c8d8f26adb1eef54571ccc85ef8ad4f2fb90c4c
oauthlib==3.1.0
opencv-python==4.4.0.46
opt-einsum==3.3.0
packaging==20.7
pandas==1.1.5
parso==0.7.1
pathspec==0.8.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==7.2.0
pkginfo==1.6.1
pluggy==0.13.1
ply==3.11
prompt-toolkit==3.0.8
protobuf==3.14.0
ptyprocess==0.6.0
py==1.9.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
pyglet==1.5.0
Pygments==2.7.3
pylint==2.6.0
Pyomo==5.7.1
pyparsing==2.4.7
pyproj==3.0.0.post1
pytest==6.1.2
pytest-cov==2.10.1
python-dateutil==2.8.1
pytz==2020.4
PyUtilib==6.0.0
PyWavelets==1.1.1
readme-renderer==28.0
recommonmark==0.6.0
regex==2020.11.13
requests==2.25.0
requests-oauthlib==1.3.0
requests-toolbelt==0.9.1
rfc3986==1.4.0
rsa==4.6
scikit-image==0.16.2
scikit-learn==0.22.2.post1
scipy==1.5.4
SecretStorage==3.3.0
six==1.15.0
snowballstemmer==2.0.0
soupsieve==2.1
Sphinx==3.3.1
sphinx-rtd-theme==0.5.0
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==1.0.3
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.4
tensorboard==2.4.0
tensorboard-plugin-wit==1.7.0
tensorflow==2.3.1
tensorflow-estimator==2.3.0
termcolor==1.1.0
threadpoolctl==2.1.0
toml==0.10.2
torch==1.7.1
torchvision==0.8.2
tqdm==4.54.1
traitlets==5.0.5
twine==3.2.0
typed-ast==1.4.1
typing-extensions==3.7.4.3
urllib3==1.26.2
wcwidth==0.2.5
webencodings==0.5.1
Werkzeug==1.0.1
wrapt==1.12.1
xlrd==2.0.1
xlwt==1.3.0
zipp==3.4.0

On a PR where the issue appeared:

absl-py==0.11.0
alabaster==0.7.12
appdirs==1.4.4
astroid==2.4.2
astunparse==1.6.3
attrs==20.3.0
autodocsumm==0.2.2
Babel==2.9.0
backcall==0.2.0
bayesian-optimization==1.2.0
beautifulsoup4==4.9.3
black==20.8b1
bleach==3.2.1
Brotli==1.0.9
cachetools==4.2.0
certifi==2020.12.5
cffi==1.14.4
chardet==4.0.0
click==7.1.2
cloudpickle==1.6.0
cma==3.0.3
colorama==0.4.4
commonmark==0.9.1
coverage==5.3
cryptography==3.3.1
cycler==0.10.0
decorator==4.4.2
docutils==0.16
fcmaes==1.2.2
Flask==1.1.2
Flask-Compress==1.8.0
future==0.18.2
gast==0.3.3
google-auth==1.24.0
google-auth-oauthlib==0.4.2
google-pasta==0.2.0
grpcio==1.34.0
gym==0.17.3
h5py==2.10.0
hiplot==0.1.21
hyperopt==0.2.5
idna==2.10
image-quality==1.2.6
imageio==2.9.0
imagesize==1.2.0
importlib-metadata==3.3.0
iniconfig==1.1.1
IOHexperimenter==0.2.8.7
ipython==7.19.0
ipython-genutils==0.2.0
isort==5.6.4
itsdangerous==1.1.0
jedi==0.17.2
jeepney==0.6.0
Jinja2==2.11.2
joblib==0.17.0
Keras==2.2.4
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
keyring==21.5.0
kiwisolver==1.3.1
koncept==0.2.2
kuti==0.9.6
lazy-object-proxy==1.4.3
libsvm==3.23.0.4
lpips==0.1.3
Markdown==3.3.3
MarkupSafe==1.1.1
matplotlib==3.3.3
mccabe==0.6.1
mixsimulator==0.2.9.5
munch==2.5.0
mypy==0.790
mypy-extensions==0.4.3
networkx==2.5
-e git+git@github.com:facebookresearch/nevergrad.git@1a81b1445b3ed86b5cdc480f9e92532f4099ffef#egg=nevergrad
nose==1.3.7
numpy==1.18.5
numpy-stubs @ git+https://github.com/numpy/numpy-stubs@2c8d8f26adb1eef54571ccc85ef8ad4f2fb90c4c
oauthlib==3.1.0
opencv-python==4.1.2.30
opt-einsum==3.3.0
packaging==20.8
pandas==1.1.5
parso==0.7.1
pathspec==0.8.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==7.2.0
pkginfo==1.6.1
pluggy==0.13.1
ply==3.11
prompt-toolkit==3.0.8
protobuf==3.14.0
ptyprocess==0.6.0
py==1.10.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
pyglet==1.5.0
Pygments==2.7.3
pylint==2.6.0
Pyomo==5.7.1
pyparsing==2.4.7
pyproj==3.0.0.post1
pytest==6.2.1
pytest-cov==2.10.1
python-dateutil==2.8.1
pytz==2020.4
PyUtilib==6.0.0
PyWavelets==1.1.1
PyYAML==5.3.1
readme-renderer==28.0
recommonmark==0.6.0
regex==2020.11.13
requests==2.25.1
requests-oauthlib==1.3.0
requests-toolbelt==0.9.1
rfc3986==1.4.0
rsa==4.6
scikit-image==0.16.2
scikit-learn==0.22.2.post1
scipy==1.5.4
SecretStorage==3.3.0
six==1.15.0
snowballstemmer==2.0.0
soupsieve==2.1
Sphinx==3.3.1
sphinx-rtd-theme==0.5.0
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==1.0.3
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.4
tensorboard==2.4.0
tensorboard-plugin-wit==1.7.0
tensorflow==2.3.1
tensorflow-estimator==2.3.0
termcolor==1.1.0
threadpoolctl==2.1.0
toml==0.10.2
torch==1.7.1
torchvision==0.8.2
tqdm==4.54.1
traitlets==5.0.5
twine==3.2.0
typed-ast==1.4.1
typing-extensions==3.7.4.3
urllib3==1.26.2
wcwidth==0.2.5
webencodings==0.5.1
Werkzeug==1.0.1
wrapt==1.12.1
xlrd==2.0.1
xlwt==1.3.0
zipp==3.4.0

pacowong commented 3 years ago

Do you mean some libraries use numpy.random.seed and reseed the random number generator?

Why not sharing a single random generator in Nevergrad, since numpy.random.seed is a legacy function anyway?
How about injecting inspect.getouterframes in the numpy.random.seed function to locate the problem?

jrapin commented 3 years ago

Do you mean some libraries use numpy.random.seed and reseed the random number generator?

Why not sharing a single random generator in Nevergrad, since numpy.random.seed is a legacy function anyway?

How about injecting inspect.getouterframes in the numpy.random.seed function to locate the problem?

possibly, but that should not have any impact.

all parametrizations have their own random_state so that we can master exactly what happens. This random_state can be used by functions to iniatilize with different seeds. Then again, some do not, and it could indeed be useful to have our one random_state (although I tend to hate global variables, but anyway np.random is already one...)
I am not familiar with it, i'm curious now. Just a thought but it may not directly link, but this is kind of a "quantum" issue, when I tried to observe/print the random states, then the error disappeared (mostly because then the initialization is done at a different time though). Something unclear must happen between the moment they are now initialized (in the init) and the moment they were initilized before (call)

pacowong commented 3 years ago

I have tracked the code leading to #965. Luckily, not many places use np.random.* functions.

The key question is which classes should be allowed to create a random_state? For example,

Random State for Class: If a class needs to generate a random number, such as nevergrad/utils/Transform.py, should we assign a random_state?
Random State for Function: Similarly, the function _noisy_call in nevergrad/functions/functionlib.py looks problematic. As it calls np.random.normal for every evaluation.
Pass-Through: the function ArtificialVariable.process in nevergrad/functions/functionlib.py, the random state is saved and then restored. My observation is it is not allowed in general.

To solve the problems, we should set some rules on random_state management. I think that the random_state management for Parameter class has been carefully written.

jrapin commented 3 years ago

Short answer is ArtificialVariable (and all of ArtificialFunction) predates by far the parametrization system, it's a huge hack to for backward compatibility but we should refactor it one day :s (which is hard given the whole range of options :D)

The key question is which classes should be allowed to create a random_state

That's a hard one. I have no exact answer to this. I tend to prefer that functions use the parametrization's random state, that way experiments share a unique source of randomness. In many cases though, functions take a seed to make them "determinitic" of some sort, with their own random state. It's not required for getting reproducibility, but it does depend on how users want their testbeds to behave, so not sure I should fight against it :s

In any case I prefer that they do not tap into numpy's default random state, so that they remain somehow isolated from outside.

facebookresearch / nevergrad

Investigate emerging reproducibility issue #966

965 fixes tests that have started being flaky recently (Dec 2020) without any related change.