desy-ml / cheetah

Fast and differentiable particle accelerator optics simulation for reinforcement learning and optimisation applications.
https://cheetah-accelerator.readthedocs.io
GNU General Public License v3.0
28 stars 13 forks source link

Sporadically failing tests #49

Closed jank324 closed 8 months ago

jank324 commented 1 year ago

🐛 Bug

Some of the tests fail sporadically. Usually just rerunning them fixes the problem. This should be looked into and fixed. Tests should alway act the same as long as the code is not changed.

cr-xu commented 1 year ago

I sometimes get

FAILED test/test_tracking.py::test_tracking_speed - assert (1684847640.92523 - 1684847640.811193) < 0.1

when running locally.

But that could be simply fixed by setting a more relaxed time constraint.

Do you experience other failure cases?

jank324 commented 1 year ago

That test might also be an issue (because we can't control the machine it's running on), but I believe the sporadically failing tests I saw involved allisclose, i.e. were caused by numerical errors. I would have to try and reproduce them to actually know which tests were failing.

jank324 commented 1 year ago

From a test run that failed on me today:

============================= test session starts ==============================
platform linux -- Python 3.10.11, pytest-7.3.1, pluggy-1.0.0
rootdir: /home/runner/work/cheetah/cheetah
plugins: cov-4.0.0
collected 141 items

test/test_accelerator.py ............................................... [ 33%]
................                                                         [ 44%]
test/test_elements_benchmark_ocelot.py .....                             [ 48%]
test/test_external_import.py ..                                          [ 49%]
test/test_final.py ..........                                            [ 56%]
test/test_less_pytest.py ..................                              [ 69%]
test/test_ocelot_import.py .............                                 [ 78%]
test/test_parameterbeam.py ..                                            [ 80%]
test/test_particles.py ....F......................                       [ 99%]
test/test_tracking.py .                                                  [100%]

=================================== FAILURES ===================================
_______________________ test_ParticleBeam_parameters_mu ________________________

    def test_ParticleBeam_parameters_mu():
>       assert torch.allclose(
            ParticleBeam_parameters.particles.mean(axis=0),
            ParticleBeam_parameters_mu,
            rtol=1e-04,
            atol=1e-08,
            equal_nan=False,
        )
E       assert False
E        +  where False = <built-in method allclose of type object at 0x7fe16927c540>(tensor([ 1.0612e-09,  1.7875e-10, -3.0217e-10,  9.1739e-10, -6.7128e-09,\n         2.0844e-09,  1.0000e+00]), tensor([ 4.9239e-10, -8.5083e-10, -1.3031e-10, -4.3553e-10,  3.5803e-09,\n        -9.5884e-10,  1.0000e+00]), rtol=0.0001, atol=1e-08, equal_nan=False)
E        +    where <built-in method allclose of type object at 0x7fe16927c540> = torch.allclose
E        +    and   tensor([ 1.0612e-09,  1.7875e-10, -3.0217e-10,  9.1739e-10, -6.7128e-09,\n         2.0844e-09,  1.0000e+00]) = <built-in method mean of Tensor object at 0x7fe0f3e63ec0>(axis=0)
E        +      where <built-in method mean of Tensor object at 0x7fe0f3e63ec0> = tensor([[-7.6279e-08, -6.1318e-08, -3.4305e-07,  ..., -1.0914e-07,\n          4.5022e-08,  1.0000e+00],\n        [-7.130...,  1.0000e+00],\n        [ 4.2782e-08,  1.0887e-07,  3.0899e-07,  ...,  8.1235e-08,\n          3.3033e-07,  1.0000e+00]]).mean
E        +        where tensor([[-7.6279e-08, -6.1318e-08, -3.4305e-07,  ..., -1.0914e-07,\n          4.5022e-08,  1.0000e+00],\n        [-7.130...,  1.0000e+00],\n        [ 4.2782e-08,  1.0887e-07,  3.0899e-07,  ...,  8.1235e-08,\n          3.3033e-07,  1.0000e+00]]) = ParticleBeam(n=100000, mu_x=0.000000, mu_xp=0.000000, mu_y=-0.000000, mu_yp=0.000000, sigma_x=0.000000, sigma_xp=0.000000, sigma_y=0.000000, sigma_yp=0.000000, sigma_s=0.000001, sigma_p=0.000001, energy=100000000.000).particles

test/test_particles.py:423: AssertionError
=========================== short test summary info ============================
FAILED test/test_particles.py::test_ParticleBeam_parameters_mu - assert False
 +  where False = <built-in method allclose of type object at 0x7fe16927c540>(tensor([ 1.0612e-09,  1.7875e-10, -3.0217e-10,  9.1739e-10, -6.7128e-09,\n         2.0844e-09,  1.0000e+00]), tensor([ 4.9239e-10, -8.5083e-10, -1.3031e-10, -4.3553e-10,  3.5803e-09,\n        -9.5884e-10,  1.0000e+00]), rtol=0.0001, atol=1e-08, equal_nan=False)
 +    where <built-in method allclose of type object at 0x7fe16927c540> = torch.allclose
 +    and   tensor([ 1.0612e-09,  1.7875e-10, -3.0217e-10,  9.1739e-10, -6.7128e-09,\n         2.0844e-09,  1.0000e+00]) = <built-in method mean of Tensor object at 0x7fe0f3e63ec0>(axis=0)
 +      where <built-in method mean of Tensor object at 0x7fe0f3e63ec0> = tensor([[-7.6279e-08, -6.1318e-08, -3.4305e-07,  ..., -1.0914e-07,\n          4.5022e-08,  1.0000e+00],\n        [-7.130...,  1.0000e+00],\n        [ 4.2782e-08,  1.0887e-07,  3.0899e-07,  ...,  8.1235e-08,\n          3.3033e-07,  1.0000e+00]]).mean
 +        where tensor([[-7.6279e-08, -6.1318e-08, -3.4305e-07,  ..., -1.0914e-07,\n          4.5022e-08,  1.0000e+00],\n        [-7.130...,  1.0000e+00],\n        [ 4.2782e-08,  1.0887e-07,  3.0899e-07,  ...,  8.1235e-08,\n          3.3033e-07,  1.0000e+00]]) = ParticleBeam(n=100000, mu_x=0.000000, mu_xp=0.000000, mu_y=-0.000000, mu_yp=0.000000, sigma_x=0.000000, sigma_xp=0.000000, sigma_y=0.000000, sigma_yp=0.000000, sigma_s=0.000001, sigma_p=0.000001, energy=100000000.000).particles
======================== 1 failed, 140 passed in 11.41s ========================
Error: Process completed with exit code 1.
cr-xu commented 8 months ago

I think this was partially related to #98 which is fixed now. Is this still an issue or we can close it?

jank324 commented 8 months ago

I think this can be closed. I haven't experienced issues like this in a while and worst case, we reopen the issue.