Closed Samuel-gwb closed 15 hours ago
BTW: pip installation is smooth!
@Samuel-gwb
Probably, you need to install cuDNN. It is available with conda.
@Samuel-gwb
Probably, you need to install cuDNN. It is available with conda.
I think the cudnn is already installed under /alphafold3_venv during the pip step, and we could see its version is nvidia-cudnn-cu12==9.5.1.17, why do we need to install it again?
Yes, cudnn is already installed after pip installing. After conda list, nvidia related staffs are as: nvidia-cublas-cu12 12.6.3.3 pypi_0 pypi nvidia-cuda-cupti-cu12 12.6.80 pypi_0 pypi nvidia-cuda-nvcc-cu12 12.6.77 pypi_0 pypi nvidia-cuda-runtime-cu12 12.6.77 pypi_0 pypi nvidia-cudnn-cu12 9.5.1.17 pypi_0 pypi nvidia-cufft-cu12 11.3.0.4 pypi_0 pypi nvidia-cusolver-cu12 11.7.1.2 pypi_0 pypi nvidia-cusparse-cu12 12.5.4.2 pypi_0 pypi nvidia-nccl-cu12 2.23.4 pypi_0 pypi nvidia-nvjitlink-cu12 12.6.77 pypi_0 pypi
I think this is a JAX (and really CUDA) installation issue rather than a problem with AlphaFold 3.
We tested using docker and virtual environments with wheels from PyPI and have not tested with conda. I suggest following the instructions for JAX and verifying that JAX works correctly: https://jax.readthedocs.io/en/latest/installation.html#conda-installation.
You may try to install cudatoolkit 12.6 and cudnn 9.5 on your machine, but this requires the latest nvidia driver 560.35
@Samuel-gwb
I created an environment.yaml
for the conda environment for my own use.
I don't have model parameters yet, so I don't know if AF3 will work properly, but please try with conda env create -f environment.yaml
if you like.
# environment.yaml
name: AF3
channels:
- conda-forge
- bioconda
- nvidia
- nodefaults
dependencies:
- hmmer ==3.4
- git >=2.47.0,<3
- wget >=1.21.4,<2
- pip >=24.3.1,<25
- curl >=8.10.1,<9
- zstd >=1.5.6,<2
- cmake >=3.30.5,<4
- cuda ==12.6
- cuda-toolkit ==12.6
- python ==3.11
- rdkit ==2024.3.5
- scikit-build-core >=0.10.7,<0.11
- pybind11 >=2.13.6,<3
- ninja >=1.12.1,<2
- gcc >=13.3.0,<13.4
- pip
- pip:
- -e .
- absl-py
- chex
- dm-haiku==0.0.13
- dm-tree
- jax[cuda12]==0.4.34
- jax-triton==0.2.0
- jaxtyping
- numpy
- triton==3.1.0
- tqdm
- zstandard
- pytest>=8.3.3, <9
After creating the AF3
environment, run the following commands.
conda activate AF3
build_data
python run_alphafold.py --json_path=test/fold_input.json --model_dir=params/ --output_dir=test/
You may try to install cudatoolkit 12.6 and cudnn 9.5 on your machine, but this requires the latest nvidia driver 560.35
I also think that the combination of the NVIDIA driver version and the versions of CUDA, cuDNN, and JAX used is the key.
You may try to install cudatoolkit 12.6 and cudnn 9.5 on your machine, but this requires the latest nvidia driver 560.35
I also think that the combination of the NVIDIA driver version and the versions of CUDA, cuDNN, and JAX used is the key.
Hey, believe me, if you use conda installation, it is best to follow what I said. Because when the cudatoolkit on my machine is 11.7, after perfectly installing all packages according to requirements.txt, it still shows "DNN library initialization failed" in the initial run stage. This is my actual operation experience. Of course, you can stick to your choice.
@Maikuraky I meant to say that with the same meaning as your post.
You may try to install cudatoolkit 12.6 and cudnn 9.5 on your machine, but this requires the latest nvidia driver 560.35
Closing this issue now as there haven't been any further comments. Feel free to comment or open a new issue if you are still encountering this problem.
I created an environment AF3 using conda , python=3.11 and then activate it, and excute:
Install the Python dependencies AlphaFold 3 needs.
pip3 install -r dev-requirements.txt pip3 install --no-deps .
Build chemical components database (this binary was installed by pip).
build_data
as indicated in: https://github.com/google-deepmind/alphafold3/issues/13#issuecomment-2470778076
and then run:
python run_alphafold.py --json_path=test/fold_input.json --model_dir=params/ --output_dir=test/
But got errors: I1114 08:17:41.222610 140555224577856 folding_input.py:1044] Detected test/fold_input.json is an AlphaFold 3 JSON since the top-level is not a list. Running AlphaFold 3. Please note that standard AlphaFold 3 model parameters are only available under terms of use provided at https://github.com/google-deepmind/alphafold3/blob/main/WEIGHTS_TERMS_OF_USE.md. If you do not agree to these terms and are using AlphaFold 3 derived model parameters, cancel execution of AlphaFold 3 inference with CTRL-C, and do not use the model parameters. I1114 08:17:41.424198 140555224577856 xla_bridge.py:895] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' I1114 08:17:41.425943 140555224577856 xla_bridge.py:895] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory Found local devices: [CudaDevice(id=0)] Building model from scratch... Processing 1 fold inputs. Processing fold input 2PV7 Checking we can load the model parameters... E1114 08:17:41.467294 1458739 cuda_dnn.cc:502] There was an error before creating cudnn handle (500): cudaErrorSymbolNotFound : named symbol not found E1114 08:17:41.467658 1458739 cuda_dnn.cc:502] There was an error before creating cudnn handle (500): cudaErrorSymbolNotFound : named symbol not found Traceback (most recent call last): File "/home/gwb/RationalDesign/alphafold3/run_alphafold.py", line 678, in
.....
File "/home/gwb/miniconda3/envs/AF3/lib/python3.11/site-packages/jax/_src/dispatch.py", line 90, in apply_primitive
outs = fun(*args)
^^^^^^^^^^
jaxlib.xla_extension.XlaRuntimeError: FAILED_PRECONDITION: DNN library initialization failed. Look at the errors above for more details.
Any suggestions to help resolve it? Many thanks !