groadabike commented 1 year ago

The regression test fails when running the test tests/regression/test_predictors.py::test_torch_msbg_stoi having a very small difference that may result from to randomness

=================================== FAILURES ===================================
_____________________________ test_torch_msbg_stoi _____________________________

regression test output differences for tests/regression/test_predictors.py::test_torch_msbg_stoi:

>   --- current
>   +++ tobe
>   @@ -1,2 +1,2 @@
>   -Torch MSBG STOILoss -0.46198, ESTOILoss -0.32999
>   +Torch MSBG STOILoss -0.46198, ESTOILoss -0.33000

This results in PR failing some tests. The problem can be solved by reruning the test manually one or 2 times.

jonbarker68 commented 1 year ago

Needs investigations: Tests should be seeded so randomness should not be a factor. How big is the actual difference? Scores displayed have been rounded -0.32999 and -0.33000 so there's no lower bound on how small this might be e.g. -0.329994999999 vs -0.329995000000. It's odd that rerunning the test can fix things - is there a seed that should have been set somewhere but which was missed?

groadabike commented 1 year ago

Yeah, The test is seeded and rounded to 5 decimals. In fact, If I run the test several times on my laptop I get the same correct result. However, for some reason, I still can't find, the regression test filed with that error. For example, here you can see a test that failed because that difference. https://github.com/claritychallenge/clarity/actions/runs/3330679131/jobs/5509476233

groadabike commented 1 year ago

Perhaps, we can use the pytorch_lightning seeding procedure using seed_everything. torch.manual_seed

ns-rse commented 1 year ago

I've been doing some work on the mbstoi.py and mbstoi_utils.py to make variable/function names PEP8 compliant (see branch ns-rse/98-nomenclature) and encountered this error re-running tests locally. Sometimes it passed, sometimes it didn't, seemingly randomly.

One of the challenges in investigating how big a difference is occurring is that the test here uses pytest-regtest which writes output to a file and compares each run to that file, if it were a normal assert x == y that were failing due to precision it would be possible to use the pytest.approx() function i.e. assert x == pytest.approx(y) manner with a specified absolute or relative tolerance threshold.

I'll have a look into this though and see if I can work out whats going on and how to resolve it.

ns-rse commented 1 year ago

There are a few other errors in the tests now as well and the linting also failed because an old version of the mirrors-mypy virtual environment that pre-commit installed failed to build the package typed-ast. That took some headscratching to work out but I've updated the .pre-commit-config.yaml in the #88 PR I have out and I'm now working through resolving each of the tests that fail which in addition to test_torch_msbg_stoi the other failures (see full log are...

`tests/regression/test_data_HOA_tools_cec2.py::test_compute_band_rotation fails`

This is strange and I was already investigating it, #88 introduces some parameterised unit-tests for the same module (albeit under tests/data/test_HOA_tools_cec2.py). If I run just tests/regression/test_data_HOA_tools_cec2.py it passes all tests, but run as part of the whole it fails. Tried putting a fixed seed in the test that fails but to no avail.

`test_full_CEC_pipeline.py fails`

Another strange error as its the clarity/evaulator/haspi/eb.py that fails with an IndexError...

E       IndexError: index 0 is out of bounds for axis 0 with size 0

clarity/evaluator/haspi/eb.py:426: IndexError

These tests pass on the main branch locally but I get two other failures from torch reported because for some reason its trying to use two different devices, but cpu and cuda:0 which I've not encountered before. :thinking:

TuZehai commented 1 year ago

This is kind of confusing. I think all the code within clarity/evaluator should be not related to torch.

Regarding the torch related errors, it could be due to the torch version, what is the torch version you are using? Presumably the version in the requirements should work.

TuZehai commented 1 year ago

I've been doing some work on the mbstoi.py and mbstoi_utils.py to make variable/function names PEP8 compliant (see branch ns-rse/98-nomenclature) and encountered this error re-running tests locally. Sometimes it passed, sometimes it didn't, seemingly randomly.

As far as I can remember, there should be no randomness in mbstoi, I will take a look.

ns-rse commented 1 year ago

Thanks @TuZehai :+1:

TuZehai commented 1 year ago

I have fixed the cuda issue in 'test_enhancers.py' and 'test_mc_conv_tasnet.py'.

However, I could not find the errors in tests/regression/test_predictors.py::test_torch_msbg_stoi. The test passes every time in my local machine....

ns-rse commented 1 year ago

Thanks for looking at this @TuZehai I've taken a different approach to the failing regression test, currently out for review #103 .

ns-rse commented 1 year ago

I've been doing some investigations on this and had a useful chat with @bobturneruk.

As @groadabike notes torch.manual_seed() may be useful here, but looking at the tests @TuZehai included a manual seed for the tests when these regression tests were first written.

There are however a number of different places in the code base where torch is being used and I'm wondering if this is what is causing problems because reproducibility is not guaranteed with PyToch.

In a number of places the device being used is set on the fly based on whether GPU is available...

❱ grep "cuda\(\)" -R -n 
enhancer/dsp/filter.py:12:        self.device = "cuda" if torch.cuda.is_available() else "cpu"
predictor/torch_msbg.py:174:            self.device = "cuda" if torch.cuda.is_available() else "cpu"
predictor/torch_msbg.py:614:        self.device = "cuda" if torch.cuda.is_available() else "cpu"

And torch is imported and used across more modules/classes than just these...

 ❱ grep "import torch" -R -l
dataset/cec1_dataset.py
engine/losses.py
enhancer/dnn/mc_conv_tasnet.py
enhancer/dsp/filter.py
predictor/torch_msbg.py
predictor/torch_stoi.py

To which end I'm wondering if there is a need to introduce a global utility that sets the seed within each of these modules to help solve this problem (beyond the crude use pytest.approx() as I've suggested in #103).

An example of this can be found in the pykale where there is utils/seed.py and this is used prior to the instantiation of any torch.

Is this something others feel would be useful?

groadabike commented 1 year ago

Hi @ns-rse, I like the idea of having a function we can call to ensure that everything has been properly seeded. Another option is to use the pytorch_lightning's seed_everything function. We are already using pytorch_lightning in the engine/system.py.

groadabike commented 1 year ago

Hi @ns-rse, I like the idea of having a function we can call to ensure that everything has been properly seeded. Another option is to use the pytorch_lightning's seed_everything function. We are already using pytorch_lightning in the engine/system.py.

Hi @TuZehai, I noticed that we are using pytorch_lightning in different parts of Clarity and in recipes but is not included in the requirements. Is that by design?

groadabike commented 1 year ago

@jonbarker68 , @TuZehai , @ns-rse It seems that even though you can seed numpy, you can't ensure exact same results with numpy.linalg as they can be using different linear algebra libraries. https://numpy.org/install/#numpy-packages--accelerated-linear-algebra-libraries

You can run the following to check what library are you using

import numpy as np np.show_config()

I wonder if this is the reason for the different results in the tests. Is there any way we can force the print of this info in github and see if they always use the same library? or, force some version in the numpy installation

ns-rse commented 1 year ago

135 introduces tests conditional on architecture which appeared to be the root cause of this problem.

claritychallenge / clarity

[BUG] test_torch_msbg_stoi fails due to difference of 0.001 #97

`tests/regression/test_data_HOA_tools_cec2.py::test_compute_band_rotation fails`

`test_full_CEC_pipeline.py fails`

135 introduces tests conditional on architecture which appeared to be the root cause of this problem.