Closed groadabike closed 1 year ago
Needs investigations: Tests should be seeded so randomness should not be a factor. How big is the actual difference? Scores displayed have been rounded -0.32999 and -0.33000 so there's no lower bound on how small this might be e.g. -0.329994999999 vs -0.329995000000. It's odd that rerunning the test can fix things - is there a seed that should have been set somewhere but which was missed?
Yeah, The test is seeded and rounded to 5 decimals. In fact, If I run the test several times on my laptop I get the same correct result. However, for some reason, I still can't find, the regression test filed with that error. For example, here you can see a test that failed because that difference. https://github.com/claritychallenge/clarity/actions/runs/3330679131/jobs/5509476233
Perhaps, we can use the pytorch_lightning seeding procedure using seed_everything. torch.manual_seed
I've been doing some work on the mbstoi.py
and mbstoi_utils.py
to make variable/function names PEP8 compliant (see branch ns-rse/98-nomenclature
) and encountered this error re-running tests locally. Sometimes it passed, sometimes it didn't, seemingly randomly.
One of the challenges in investigating how big a difference is occurring is that the test here uses pytest-regtest
which writes output to a file and compares each run to that file, if it were a normal assert x == y
that were failing due to precision it would be possible to use the pytest.approx()
function i.e. assert x == pytest.approx(y)
manner with a specified absolute or relative tolerance threshold.
I'll have a look into this though and see if I can work out whats going on and how to resolve it.
There are a few other errors in the tests now as well and the linting also failed because an old version of the mirrors-mypy
virtual environment that pre-commit
installed failed to build the package typed-ast
. That took some headscratching to work out but I've updated the .pre-commit-config.yaml
in the #88 PR I have out and I'm now working through resolving each of the tests that fail which in addition to test_torch_msbg_stoi
the other failures (see full log are...
tests/regression/test_data_HOA_tools_cec2.py::test_compute_band_rotation fails
This is strange and I was already investigating it, #88 introduces some parameterised unit-tests for the same module (albeit under tests/data/test_HOA_tools_cec2.py
). If I run just tests/regression/test_data_HOA_tools_cec2.py
it passes all tests, but run as part of the whole it fails. Tried putting a fixed seed in the test that fails but to no avail.
test_full_CEC_pipeline.py fails
Another strange error as its the clarity/evaulator/haspi/eb.py
that fails with an IndexError
...
E IndexError: index 0 is out of bounds for axis 0 with size 0
clarity/evaluator/haspi/eb.py:426: IndexError
These tests pass on the main
branch locally but I get two other failures from torch reported because for some reason its trying to use two different devices, but cpu
and cuda:0
which I've not encountered before. :thinking:
This is kind of confusing. I think all the code within clarity/evaluator
should be not related to torch.
Regarding the torch related errors, it could be due to the torch version, what is the torch version you are using? Presumably the version in the requirements should work.
I've been doing some work on the
mbstoi.py
andmbstoi_utils.py
to make variable/function names PEP8 compliant (see branchns-rse/98-nomenclature
) and encountered this error re-running tests locally. Sometimes it passed, sometimes it didn't, seemingly randomly.
As far as I can remember, there should be no randomness in mbstoi, I will take a look.
Thanks @TuZehai :+1:
I have fixed the cuda issue in 'test_enhancers.py' and 'test_mc_conv_tasnet.py'.
However, I could not find the errors in tests/regression/test_predictors.py::test_torch_msbg_stoi
. The test passes every time in my local machine....
Thanks for looking at this @TuZehai I've taken a different approach to the failing regression test, currently out for review #103 .
I've been doing some investigations on this and had a useful chat with @bobturneruk.
As @groadabike notes torch.manual_seed()
may be useful here, but looking at the tests @TuZehai included a manual seed for the tests when these regression tests were first written.
There are however a number of different places in the code base where torch
is being used and I'm wondering if this is what is causing problems because reproducibility is not guaranteed with PyToch.
In a number of places the device being used is set on the fly based on whether GPU is available...
❱ grep "cuda\(\)" -R -n
enhancer/dsp/filter.py:12: self.device = "cuda" if torch.cuda.is_available() else "cpu"
predictor/torch_msbg.py:174: self.device = "cuda" if torch.cuda.is_available() else "cpu"
predictor/torch_msbg.py:614: self.device = "cuda" if torch.cuda.is_available() else "cpu"
And torch
is imported and used across more modules/classes than just these...
❱ grep "import torch" -R -l
dataset/cec1_dataset.py
engine/losses.py
enhancer/dnn/mc_conv_tasnet.py
enhancer/dsp/filter.py
predictor/torch_msbg.py
predictor/torch_stoi.py
To which end I'm wondering if there is a need to introduce a global utility that sets the seed within each of these modules to help solve this problem (beyond the crude use pytest.approx()
as I've suggested in #103).
An example of this can be found in the pykale
where there is utils/seed.py
and this is used prior to the instantiation of any torch
.
Is this something others feel would be useful?
Hi @ns-rse, I like the idea of having a function we can call to ensure that everything has been properly seeded. Another option is to use the pytorch_lightning's seed_everything function. We are already using pytorch_lightning in the engine/system.py.
Hi @ns-rse, I like the idea of having a function we can call to ensure that everything has been properly seeded. Another option is to use the pytorch_lightning's seed_everything function. We are already using pytorch_lightning in the engine/system.py.
Hi @TuZehai, I noticed that we are using pytorch_lightning in different parts of Clarity and in recipes but is not included in the requirements. Is that by design?
@jonbarker68 , @TuZehai , @ns-rse It seems that even though you can seed numpy, you can't ensure exact same results with numpy.linalg as they can be using different linear algebra libraries. https://numpy.org/install/#numpy-packages--accelerated-linear-algebra-libraries
You can run the following to check what library are you using
import numpy as np np.show_config()
I wonder if this is the reason for the different results in the tests. Is there any way we can force the print of this info in github and see if they always use the same library? or, force some version in the numpy installation
The regression test fails when running the test
tests/regression/test_predictors.py::test_torch_msbg_stoi
having a very small difference that may result from to randomnessThis results in PR failing some tests. The problem can be solved by reruning the test manually one or 2 times.