google-deepmind / alphafold3

AlphaFold 3 inference pipeline.
Other
4.95k stars 549 forks source link

environment created by conda for af3, but errors like "DNN library initialization failed" #34

Closed Samuel-gwb closed 15 hours ago

Samuel-gwb commented 6 days ago

I created an environment AF3 using conda , python=3.11 and then activate it, and excute:

Install the Python dependencies AlphaFold 3 needs.

pip3 install -r dev-requirements.txt pip3 install --no-deps .

Build chemical components database (this binary was installed by pip).

build_data

as indicated in: https://github.com/google-deepmind/alphafold3/issues/13#issuecomment-2470778076

and then run:

python run_alphafold.py --json_path=test/fold_input.json --model_dir=params/ --output_dir=test/

But got errors: I1114 08:17:41.222610 140555224577856 folding_input.py:1044] Detected test/fold_input.json is an AlphaFold 3 JSON since the top-level is not a list. Running AlphaFold 3. Please note that standard AlphaFold 3 model parameters are only available under terms of use provided at https://github.com/google-deepmind/alphafold3/blob/main/WEIGHTS_TERMS_OF_USE.md. If you do not agree to these terms and are using AlphaFold 3 derived model parameters, cancel execution of AlphaFold 3 inference with CTRL-C, and do not use the model parameters. I1114 08:17:41.424198 140555224577856 xla_bridge.py:895] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' I1114 08:17:41.425943 140555224577856 xla_bridge.py:895] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory Found local devices: [CudaDevice(id=0)] Building model from scratch... Processing 1 fold inputs. Processing fold input 2PV7 Checking we can load the model parameters... E1114 08:17:41.467294 1458739 cuda_dnn.cc:502] There was an error before creating cudnn handle (500): cudaErrorSymbolNotFound : named symbol not found E1114 08:17:41.467658 1458739 cuda_dnn.cc:502] There was an error before creating cudnn handle (500): cudaErrorSymbolNotFound : named symbol not found Traceback (most recent call last): File "/home/gwb/RationalDesign/alphafold3/run_alphafold.py", line 678, in ..... File "/home/gwb/miniconda3/envs/AF3/lib/python3.11/site-packages/jax/_src/dispatch.py", line 90, in apply_primitive outs = fun(*args) ^^^^^^^^^^ jaxlib.xla_extension.XlaRuntimeError: FAILED_PRECONDITION: DNN library initialization failed. Look at the errors above for more details.

Any suggestions to help resolve it? Many thanks !

Samuel-gwb commented 6 days ago

BTW: pip installation is smooth!

eunos-1128 commented 5 days ago

@Samuel-gwb

Probably, you need to install cuDNN. It is available with conda.

shaoyanhan commented 5 days ago

@Samuel-gwb

Probably, you need to install cuDNN. It is available with conda.

I think the cudnn is already installed under /alphafold3_venv during the pip step, and we could see its version is nvidia-cudnn-cu12==9.5.1.17, why do we need to install it again?

Samuel-gwb commented 5 days ago

Yes, cudnn is already installed after pip installing. After conda list, nvidia related staffs are as: nvidia-cublas-cu12 12.6.3.3 pypi_0 pypi nvidia-cuda-cupti-cu12 12.6.80 pypi_0 pypi nvidia-cuda-nvcc-cu12 12.6.77 pypi_0 pypi nvidia-cuda-runtime-cu12 12.6.77 pypi_0 pypi nvidia-cudnn-cu12 9.5.1.17 pypi_0 pypi nvidia-cufft-cu12 11.3.0.4 pypi_0 pypi nvidia-cusolver-cu12 11.7.1.2 pypi_0 pypi nvidia-cusparse-cu12 12.5.4.2 pypi_0 pypi nvidia-nccl-cu12 2.23.4 pypi_0 pypi nvidia-nvjitlink-cu12 12.6.77 pypi_0 pypi

jsspencer commented 5 days ago

I think this is a JAX (and really CUDA) installation issue rather than a problem with AlphaFold 3.

We tested using docker and virtual environments with wheels from PyPI and have not tested with conda. I suggest following the instructions for JAX and verifying that JAX works correctly: https://jax.readthedocs.io/en/latest/installation.html#conda-installation.

Maikuraky commented 4 days ago

You may try to install cudatoolkit 12.6 and cudnn 9.5 on your machine, but this requires the latest nvidia driver 560.35

eunos-1128 commented 4 days ago

@Samuel-gwb

I created an environment.yaml for the conda environment for my own use.

I don't have model parameters yet, so I don't know if AF3 will work properly, but please try with conda env create -f environment.yaml if you like.

# environment.yaml
name: AF3
channels:
- conda-forge
- bioconda
- nvidia
- nodefaults
dependencies:
- hmmer ==3.4
- git >=2.47.0,<3
- wget >=1.21.4,<2
- pip >=24.3.1,<25
- curl >=8.10.1,<9
- zstd >=1.5.6,<2
- cmake >=3.30.5,<4
- cuda ==12.6
- cuda-toolkit ==12.6
- python ==3.11
- rdkit ==2024.3.5
- scikit-build-core >=0.10.7,<0.11
- pybind11 >=2.13.6,<3
- ninja >=1.12.1,<2
- gcc >=13.3.0,<13.4
- pip
- pip:
  - -e .
  - absl-py
  - chex
  - dm-haiku==0.0.13
  - dm-tree
  - jax[cuda12]==0.4.34
  - jax-triton==0.2.0
  - jaxtyping
  - numpy
  - triton==3.1.0
  - tqdm
  - zstandard
  - pytest>=8.3.3, <9

After creating the AF3 environment, run the following commands.

conda activate AF3
build_data
python run_alphafold.py --json_path=test/fold_input.json --model_dir=params/ --output_dir=test/
eunos-1128 commented 4 days ago

You may try to install cudatoolkit 12.6 and cudnn 9.5 on your machine, but this requires the latest nvidia driver 560.35

I also think that the combination of the NVIDIA driver version and the versions of CUDA, cuDNN, and JAX used is the key.

Maikuraky commented 4 days ago

You may try to install cudatoolkit 12.6 and cudnn 9.5 on your machine, but this requires the latest nvidia driver 560.35

I also think that the combination of the NVIDIA driver version and the versions of CUDA, cuDNN, and JAX used is the key.

Hey, believe me, if you use conda installation, it is best to follow what I said. Because when the cudatoolkit on my machine is 11.7, after perfectly installing all packages according to requirements.txt, it still shows "DNN library initialization failed" in the initial run stage. This is my actual operation experience. Of course, you can stick to your choice.

eunos-1128 commented 4 days ago

@Maikuraky I meant to say that with the same meaning as your post.

You may try to install cudatoolkit 12.6 and cudnn 9.5 on your machine, but this requires the latest nvidia driver 560.35

Augustin-Zidek commented 15 hours ago

Closing this issue now as there haven't been any further comments. Feel free to comment or open a new issue if you are still encountering this problem.