OpenBioSim / biosimspace

An interoperable Python framework for biomolecular simulation.
https://biosimspace.openbiosim.org
GNU General Public License v3.0
71 stars 11 forks source link

jaxlib incompatibility with a fresh bss install #296

Closed jmichel80 closed 1 month ago

jmichel80 commented 3 months ago

Describe the bug It is currently not possible to use BSS.FreeEnergy.Relative.analyse() after a fresh install of biosimspace.

To Reproduce Issue created by installing somd2 from scratch following the instructions here https://github.com/OpenBioSim/somd2/blob/main/README.md

Then attempting to run an MBAR analysis on a somd2 output folder will give the following:

(somd2) director@neogodzilla:~/OBS/somd2/reproducibility-report/somd2/32~26_Free$ ipython
Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.24.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import BioSimSpace as BSS

INFO:numexpr.utils:Note: NumExpr detected 20 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.

In [2]: pmf1, overlap1 = BSS.FreeEnergy.Relative.analyse("output")
INFO:jax._src.xla_bridge:Unable to initialize backend 'cuda': Unable to use CUDA because of the following issues with CUDA components:
Outdated CUDA installation found.
Version JAX was built against: 11080
Minimum supported: 12010
Installed version: 11080
The local installation version must be no lower than 12010.
--------------------------------------------------
Outdated cuBLAS installation found.
Version JAX was built against: 111103
Minimum supported: 120100
Installed version: 111103
The local installation version must be no lower than 120100.
--------------------------------------------------
Outdated cuSPARSE installation found.
Version JAX was built against: 11705
Minimum supported: 12100
Installed version: 11705
The local installation version must be no lower than 12100.
INFO:jax._src.xla_bridge:Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: CUDA
INFO:jax._src.xla_bridge:Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
WARNING:jax._src.xla_bridge:CUDA backend failed to initialize: Unable to use CUDA because of the following issues with CUDA components:
Outdated CUDA installation found.
Version JAX was built against: 11080
Minimum supported: 12010
Installed version: 11080
The local installation version must be no lower than 12010.
--------------------------------------------------
Outdated cuBLAS installation found.
Version JAX was built against: 111103
Minimum supported: 120100
Installed version: 111103
The local installation version must be no lower than 120100.
--------------------------------------------------
Outdated cuSPARSE installation found.
Version JAX was built against: 11705
Minimum supported: 12100
Installed version: 11705
The local installation version must be no lower than 12100..(Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

Here are the versions of the dependencies that may have triggered this error

(somd2) director@neogodzilla:~/OBS/somd2/reproducibility-report/somd2/32~26_Free$ mamba list | grep alchemlyb
alchemlyb                 2.3.0              pyhd8ed1ab_0    conda-forge
(somd2) director@neogodzilla:~/OBS/somd2/reproducibility-report/somd2/32~26_Free$ mamba list | grep openmm
openmm                    8.1.1           py310h43b6314_1    conda-forge
(somd2) director@neogodzilla:~/OBS/somd2/reproducibility-report/somd2/32~26_Free$ mamba list | grep pym
pymbar                    4.0.3                hff52083_1    conda-forge
pymbar-core               4.0.3           py310h1f7b6fc_1    conda-forge
pymsmt                    22.0                     pypi_0    pypi

and the version of jaxlib installed

(somd2) director@neogodzilla:~/OBS/somd2/reproducibility-report/somd2/32~26_Free$ mamba list | grep jax
jax                       0.4.27             pyhd8ed1ab_0    conda-forge
jaxlib                    0.4.23          cuda118py310hd0f2884_202    conda-forge

(please complete the following information): Issue tested on Linux Ubuntu 22.04 LTS, with python 3.12, 3.11 and 3.10

jmichel80 commented 3 months ago

pinning pymbar to 4.0.2 solves the issue

lohedges commented 3 months ago

Good stuff. I'll see if there's a report at their GitHub page when I'm back.

lohedges commented 3 months ago

Reminds me of [this] (https://github.com/OpenBioSim/biosimspace/issues/207), where the solution was to also use pymbar 4.0.2. (They messed up a build a while back which causes an incorrect jaxlib to be pulled in, which has never been fixed properly.)

fjclark commented 3 months ago

What worked for me, in case it's helpful in future (also working from a fresh BSS install, with python 3.12.3):

I think this is because jax 0.4.26 dropped support for CUDA 11 (https://github.com/google/jax/issues/18032#issuecomment-2035835962). Avoiding upgrading CUDA, this error was fixed by by downgrading jax (mamba install "jax<0.4.26"). I then got an error from the XLA compiler: XlaRuntimeError: INTERNAL: XLA requires ptxas version 11.8 or higher which was fixed by installing cuda-nvcc with mamba install -c nvidia "cuda-nvcc=11.8" - pymbar 4.0.3 then works for me. I noticed William logged a ptaxs pymbar issue last year, which is still open : https://github.com/choderalab/pymbar/issues/498 .

lohedges commented 3 months ago

It will be nice when alchemlyb makes jax optional (I think pymbar have split things out now). It really makes no sense for this to bork our install when the jax stuff isn't even needed.

lohedges commented 1 month ago

Closing as this doesn't appear to be causing issues at present. I believe the problem hasn't fundamentally been resolved, but hopefully we are now working with versions of alchemlyb and pymbar that don't trigger the environment resolution issues. At least we have this documented and can re-open if it does raise its head again in future.