YoshitakaMo / localcolabfold

ColabFold on your local PC
MIT License
524 stars 127 forks source link

Question: DNN library initialization failed #236

Open laolanllx opened 1 month ago

laolanllx commented 1 month ago

I followed the instruction to install colabfold and it worked successfully. When I try to run a prediction, it stopped and showed:

2024-05-11 16:05:21,977 Could not predict test. Not Enough GPU memory? FAILED_PRECONDITION: DNN library initialization failed. Look at the errors above for more details.
2024-05-11 16:05:21,977 Done

Computational environment

gcc --version gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

nvidia-smi +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 Off | Off | | 30% 45C P8 23W / 350W | 318MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce RTX 4090 Off | 00000000:08:00.0 Off | Off | | 30% 44C P8 13W / 350W | 18MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1660 G /usr/lib/xorg/Xorg 35MiB | | 0 N/A N/A 4515 G gnome-control-center 6MiB | | 0 N/A N/A 5403 G /usr/lib/xorg/Xorg 71MiB | | 0 N/A N/A 5552 G /usr/bin/gnome-shell 63MiB | | 1 N/A N/A 1660 G /usr/lib/xorg/Xorg 4MiB | | 1 N/A N/A 5403 G /usr/lib/xorg/Xorg 4MiB | +---------------------------------------------------------------------------------------+


Any solutions for this issue? Thank you so much for your reply!
YoshitakaMo commented 1 month ago

Have you tried the solution? https://github.com/YoshitakaMo/localcolabfold/issues/210

laolanllx commented 1 month ago

Have you tried the solution? #210 I test the gpu and it got gpu as expected.

Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>> import jax
>> print(jax.local_devices()[0].platform)
gpu

I also check jax and jaxlib version, showed below:

(base) xtal@GPU1:/opt/colabfold-20240509/localcolabfold/colabfold-conda/bin$ python3.10 -m pip list | grep nvidia-cudnn
nvidia-cudnn-cu12            9.1.0.70
(base) xtal@GPU1:/opt/colabfold-20240509/localcolabfold/colabfold-conda/bin$ python3.10 -m pip list | grep jax
jax                          0.4.23
jax-cuda12-pjrt              0.4.23
jax-cuda12-plugin            0.4.23
jaxlib                       0.4.23+cuda12.cudnn89

Looks like my cuda version is 11.8 but it used cu12 related packages, should I change these to *cu11?

YoshitakaMo commented 1 month ago

According to the last comment from fatpmeireles,

$ /opt/colabfold-20240509/localcolabfold/colabfold-conda/bin/python3.10 -m pip install --upgrade "jax[cuda12]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
$ /opt/colabfold-20240509/localcolabfold/colabfold-conda/bin/python3.10 -m pip install "colabfold[alphafold] @ git+https://github.com/sokrypton/ColabFold"

may solve the issue. If not, please report me.

I also updated the installer script, install_colabbatch_linux.sh, just before. The new one may be also helpful.

ZhaoKe-BIT commented 1 month ago

I have met the same problem. Like what said in jax website, I solved this issue by unset LD_LIBRARY_PATH before using colabfold.

laolanllx commented 1 month ago

According to the last comment from fatpmeireles,

$ /opt/colabfold-20240509/localcolabfold/colabfold-conda/bin/python3.10 -m pip install --upgrade "jax[cuda12]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
$ /opt/colabfold-20240509/localcolabfold/colabfold-conda/bin/python3.10 -m pip install "colabfold[alphafold] @ git+https://github.com/sokrypton/ColabFold"

may solve the issue. If not, please report me.

I also updated the installer script, install_colabbatch_linux.sh, just before. The new one may be also helpful.

After I run these two pip commends and run colabfold, the server crashed. Here is the updated packages:

(base) xtal@GPU1:/opt/colabfold-20240509/localcolabfold/colabfold-conda/bin$  python3.10 -m pip list | grep nvidia-cudnn
nvidia-cudnn-cu12            8.9.7.29
(base) xtal@GPU1:/opt/colabfold-20240509/localcolabfold/colabfold-conda/bin$ python3.10 -m pip list | grep jax
jax                          0.4.28
jax-cuda12-pjrt              0.4.28
jax-cuda12-plugin            0.4.28
jaxlib                       0.4.28
YoshitakaMo commented 1 month ago

After I run these two pip commends and run colabfold, the server crashed.

Please describe the error log here with the return value of echo $LD_LIBRARY_PATH.

laolanllx commented 1 month ago

After I run these two pip commends and run colabfold, the server crashed.

Please describe the error log here with the return value of echo $LD_LIBRARY_PATH.

(base) xtal@GPU1:/opt/colabfold-20240509/localcolabfold/colabfold-conda/bin$ echo $LD_LIBRARY_PATH
/usr/local/cuda-11.8/lib64:/usr/local/cuda-11.8/lib64
YoshitakaMo commented 1 month ago

the server crashed.

I want to know what happened here to analyze the causes. Did the same error occur?

Or, have you tried updating the CUDA and nvidia drivers to the latest one (CUDA 12.4 / nvidia-driver 550.54.14) and reinstalling localcolabfold after it?

YoshitakaMo commented 1 month ago

Looks like my cuda version is 11.8 but it used cu12 related packages, should I change these to *cu11?

Ah, this may be the right answer. If you are using CUDA 11.8, please use jax[cuda11] (before updating CUDA / nvidia driver):

$ /opt/colabfold-20240509/localcolabfold/colabfold-conda/bin/python3.10 -m pip install --upgrade "jax[cuda11]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
$ /opt/colabfold-20240509/localcolabfold/colabfold-conda/bin/python3.10 -m pip install "colabfold[alphafold] @ git+https://github.com/sokrypton/ColabFold"
laolanllx commented 1 month ago

Looks like my cuda version is 11.8 but it used cu12 related packages, should I change these to *cu11?

Ah, this may be the right answer. If you are using CUDA 11.8, please use jax[cuda11] (before updating CUDA / nvidia driver):

$ /opt/colabfold-20240509/localcolabfold/colabfold-conda/bin/python3.10 -m pip install --upgrade "jax[cuda11]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
$ /opt/colabfold-20240509/localcolabfold/colabfold-conda/bin/python3.10 -m pip install "colabfold[alphafold] @ git+https://github.com/sokrypton/ColabFold"

It shows "jax 0.4.28 does not provide the extra 'cuda11'", any command can I use to downgrade jax to 0.4.23?

(base) xtal@GPU1:/opt/colabfold-20240509/localcolabfold/colabfold-conda/bin$ python3.10 -m pip install --upgrade "jax[cuda11]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
Looking in links: https://storage.googleapis.com/jax-releases/jax_releases.html
Requirement already satisfied: jax[cuda11] in /opt/colabfold-20240509/localcolabfold/colabfold-conda/lib/python3.10/site-packages (0.4.28)
WARNING: jax 0.4.28 does not provide the extra 'cuda11'
Requirement already satisfied: ml-dtypes>=0.2.0 in /opt/colabfold-20240509/localcolabfold/colabfold-conda/lib/python3.10/site-packages (from jax[cuda11]) (0.3.2)
Requirement already satisfied: numpy>=1.22 in /opt/colabfold-20240509/localcolabfold/colabfold-conda/lib/python3.10/site-packages (from jax[cuda11]) (1.26.4)
Requirement already satisfied: opt-einsum in /opt/colabfold-20240509/localcolabfold/colabfold-conda/lib/python3.10/site-packages (from jax[cuda11]) (3.3.0)
Requirement already satisfied: scipy>=1.9 in /opt/colabfold-20240509/localcolabfold/colabfold-conda/lib/python3.10/site-packages (from jax[cuda11]) (1.13.0)
YoshitakaMo commented 1 month ago

It shows "jax 0.4.28 does not provide the extra 'cuda11'", any command can I use to downgrade jax to 0.4.23?

That's bad news... pip installation of JAX only supports up to ver. 0.4.25 for CUDA 11, according to the URL. "jax[cuda11]==0.4.25" is possible.

I have to update README.md to use CUDA 12.1 or later (CUDA 12.4 is recommended).