Closed gomezzz closed 1 year ago
@gomezzz Can we just release a version with dependencies pinned against a <=
as well?
I will at least look into the torch problem now.
@gomezzz Can we just release a version with dependencies pinned against a
<=
as well?
@ilan-gold We can but I don't think it would be ideal because if somebody goes
conda install tensorflow
they will get the newest version
thus, the prospective following `conda install torchquad' would lead to either a forced downgrade or no compatible version being found. Worst case we can do it, but I don't think the APIs are likely to have changed much. Also, I think we only directly import from the frameworks in the tests (and maybe one or two other places?) to avoid problems like this.
I will at least look into the torch problem now.
Thank you! :pray: If lack of a GPU is the problem running in google collab may be an option (haven't tried though), otherwise I might be able to take a look next week some time :)
That's a good point. I cant run latest torch actually on the hardware I have access to because the Cuda version is too low (if I remember correctly).
I cannot reproduce on colab, although I get a different somewhat stranger error that seems like it might not actually be an error since the numerical-error reported is still very good (presumably hardware/torch version specific, but not the error you're seeing in the logs): https://colab.research.google.com/drive/1lFpdtY5zV7VpW88aazedA3n4khedHDQP?usp=sharing :( very bad
Also @gomezzz can you point to where the tests are failing? I don't see anything in the actions
If this was your computer, it could be specific to something there.
@ilan-gold Thanks for the efforts! I will try to run precisely your code on my machine next week to see if that can help pin it down. (and to confirm I didn't mess up the setup etc. :D)
There's some colab-specific stuff in there, but not sure it's any different that what you would do. I just clone, check out the release branch, install deps (after deleting the named env from the .yml file because you can only use base
on colab), and then pytest
Looking at this again, I used encironment_backends_all.yml
not what you posted here. Imll try it again
Ok, that didn't change the outcome. Sorry @gomezzz :(
@ilan-gold running your notebook in colab, I get
FAILED gauss_test.py::test_integrate_torch - assert (3 > 3 or 7.105427357601002e-15 < 2e-16)
FAILED gauss_test.py::test_integrate_tensorflow - assert (3 > 3 or 7.105427357601002e-15 < 2e-16)
Was this what you got?
On one of our GPU servers I get
FAILED gauss_test.py::test_integrate_torch - assert (3 > 3 or 7.105427357601002e-15 < 2e-16)
FAILED gauss_test.py::test_integrate_tensorflow - assert (3 > 3 or 7.105427357601002e-15 < 2e-16)
FAILED gradient_test.py::test_gradients_torch - AssertionError: assert 0.11964358930767993 < 0.1
so seems the test bounds are bit too harsh for gauss and the gradient test?
I also get 115 warnings, oof. We might wanna look at those at some point :D :see_no_evil:
(I don't get the errors I faced previously on either, so I think probably something went wrong setting up the env before.)
@gomezzz Yes these are the sorts of errors I saw. Should we bump the tolerance? If it's passing here, is it a problem? I guess so since tests are made to be run locallly
@gomezzz Yes these are the sorts of errors I saw. Should we bump the tolerance? If it's passing here, is it a problem? I guess so since tests are made to be run locallly
yea let's increase, I'd always aim to have passing tests on GPUs too :)
Issue
Problem Description
Related to Release 0.4.0, can be fixed directly on
release
branch.Currently the tests fail on GPUs, for torch because of a missing transfer from GPU memory to host. For TF there seems to be a breaking API change (
ImportError: cannot import name 'np_config' from 'tensorflow.python.ops.numpy_ops'
, see https://stackoverflow.com/questions/75727569/cannot-import-name-np-config-from-tensorflow-python-ops-numpy-ops )Logs:
pytorch_gpu: pytest_gpu0.log
TF_gpu, torch_cpu (failed to get JAX working since this was on a win machine): pytest_all_gpu0.log
Setting up an env to check with all frameworks on GPU also proved time-consuming. (and failed for jax)
Expected Behavior
What Needs to be Done
How Can It Be Tested or Reproduced
Run pytest , used following env