google-deepmind / alphafold

Open source code for AlphaFold.
Apache License 2.0
12.35k stars 2.21k forks source link

InvalidSpec: The package "nvidia/linux-64::cuda-compiler==12.5.0=0" is not available for the specified platform #945

Open ocstx opened 4 months ago

ocstx commented 4 months ago

I just encountered the error written in the title. InvalidSpec: The package "nvidia/linux-64::cuda-compiler==12.5.0=0" is not available for the specified platform I have installed AF in several computers all of them running the same AlmaLinux installed in the same way. Never had a problem. but today I found that line at the end. In the midle of the screen output I can see this: ERROR [ 5/12] RUN conda install -qy conda==24.1.2 pip python=3.11 && conda install -y -c nvidia cuda=12.2.2 && conda install -y -c conda-forge openmm=8.0.0 pdbfixer && conda clean --all --force-pkgs-dirs --yes 269.9s this is what I executed:

git clone https://github.com/deepmind/alphafold.git
cd alphafold/
docker build -f docker/Dockerfile -t alphafold .

I believe that this is the first time that I execute the docker build before downloading the DBs, could it be related?

rosswalker commented 3 months ago

I see the same problem on multiple machines running Ubuntu 22.04 LTS. This worked fine a week ago so something has changed and for some reason it is looking for CUDA 12.5.0 despite the dockerfile having:

ARG CUDA=12.2.2 FROM nvidia/cuda:${CUDA}-cudnn8-runtime-ubuntu20.04 ARG CUDA

Also tried with

ARG CUDA=12.2.2 FROM nvidia/cuda:${CUDA}-cudnn8-devel-ubuntu22.04 ARG CUDA

Same issue.

rosswalker commented 3 months ago

Looks like the problem is related to the openmm step of the install. Splitting the Dockerfile as:

ENV PATH="/opt/conda/bin:$PATH"
ENV LD_LIBRARY_PATH="/opt/conda/lib:$LD_LIBRARY_PATH"
RUN conda install -qy conda==24.1.2 pip python=3.11
RUN conda install -y -c nvidia cuda=${CUDA_VERSION}
RUN conda install -y -c conda-forge openmm=8.0.0 
RUN conda install -y -c conda-forge pdbfixer
RUN conda clean --all --force-pkgs-dirs --yes

Dies at the openmm step.

Attempting to explicitly tell it to install openmm=8.0.0 with cudatoolkit 12.2.2 (${CUDA_VERSION}), per http://docs.openmm.org/latest/userguide/application/01_getting_started.html also doesn't work - it seems to just ignore the specific cudatoolkit version request.

ENV LD_LIBRARY_PATH="/opt/conda/lib:$LD_LIBRARY_PATH"
RUN conda install -qy conda==24.1.2 pip python=3.11
RUN conda install -y -c nvidia cuda=${CUDA_VERSION}
RUN conda install -y -c conda-forge openmm=8.0.0 cudatoolkit=12.2.0
RUN conda install -y -c conda-forge pdbfixer
RUN conda clean --all --force-pkgs-dirs --yes

CACHED [ 6/16] RUN conda install -y -c nvidia cuda=12.2.2                                                                                                    0.0s
ERROR [ 7/16] RUN conda install -y -c conda-forge openmm=8.0.0 cudatoolkit=12.2.0                                                                            8.5s
------                                                                                                                                                                
[ 7/16] RUN conda install -y -c conda-forge openmm=8.0.0 cudatoolkit=12.2.0:                                                                                       
0.306 /bin/bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by /bin/bash)                                                               
0.907 Channels:                                                                                                                                                       
0.907  - conda-forge                                                                                                                                                  
0.907  - defaults                                                                                                                                                     
0.907  - nvidia
0.907 Platform: linux-64
0.907 Collecting package metadata (repodata.json): ...working... done
8.266 Solving environment: ...working... failed
8.322 
8.322 InvalidSpec: The package "nvidia/linux-64::cuda-compiler==12.5.0=0" is not available for the specified platform
8.322 
------
Dockerfile:59
--------------------
  57 |     RUN conda install -qy conda==24.1.2 pip python=3.11
  58 |     RUN conda install -y -c nvidia cuda=${CUDA_VERSION}
  59 | >>> RUN conda install -y -c conda-forge openmm=8.0.0 cudatoolkit=12.2.0
  60 |     RUN conda install -y -c conda-forge pdbfixer
  61 |     RUN conda clean --all --force-pkgs-dirs --yes
--------------------
ERROR: failed to solve: process "/bin/bash -o pipefail -c conda install -y -c conda-forge openmm=8.0.0 cudatoolkit=12.2.0" did not complete successfully: exit code: 1

Seems like something is broken in anaconda itself? Or conda-forge?

rosswalker commented 3 months ago

So manually specifying the version of all the various related packages works around the NVIDIA / conda versioning bug and actually allows one to build alphafold2 properly. So until someone actually fixes the issue with the conda solver just edit your dockerfile and change

ENV PATH="/opt/conda/bin:$PATH"
ENV LD_LIBRARY_PATH="/opt/conda/lib:$LD_LIBRARY_PATH"
RUN conda install -qy conda==24.1.2 pip python=3.11 \
    && conda install -y -c nvidia cuda=${CUDA_VERSION} \
    && conda install -y -c conda-forge openmm=8.0.0 pdbfixer \
    && conda clean --all --force-pkgs-dirs --yes

to

# Install conda packages.
ENV PATH="/opt/conda/bin:$PATH"
ENV LD_LIBRARY_PATH="/opt/conda/lib:$LD_LIBRARY_PATH"
RUN conda install -qy conda==24.5.0 pip python=3.11 \
 && conda install -y -c nvidia cuda=12.2.2 cuda-tools=12.2.2 cuda-toolkit=12.2.2 cuda-version=12.2 cuda-command-line-tools=12.2.2 cuda-compiler=12.2.2 cuda-runtime=12.2.2
RUN conda install -y -c conda-forge openmm=8.0.0 pdbfixer \
    && conda clean --all --force-pkgs-dirs --yes
ocstx commented 3 months ago

WORKED! thank you very much One question rising from my ignorance: in the original file, at the beginning "$CUDA" variable is defined, but in the original line where cuda is installed, the variable $CUDA_VERSION is used, which has not been defined anywhere, but it works, docker executes "conda install -y -c nvidia cuda=12.2.2". Where does this variable come from? when was defined? why use $CUDA_VERSION and not $CUDA ?

sude8594 commented 3 months ago

(cuda_env) ubuntu@ip-172-31-71-33:/$ conda install -c nvidia cuda-compiler=12.5.0 Channels:

InvalidSpec: The package "nvidia/linux-64::cuda-compiler==12.5.0=0" is not available for the specified platform

I'm encountering the same issue. I followed the suggestion from Ross Walker's comment and managed to successfully create a Docker image. However, when I attempt to run the prediction, it doesn't produce any output. I've logged the command, and here's the content of the log file:

I0615 05:26:32.780370 140023758477120 run_docker.py:122] Mounting /data/input -> /mnt/fasta_path_0 I0615 05:26:32.780579 140023758477120 run_docker.py:122] Mounting /data/af_download_data/uniref90 -> /mnt/uniref90_database_path I0615 05:26:32.780725 140023758477120 run_docker.py:122] Mounting /data/af_download_data/mgnify -> /mnt/mgnify_database_path I0615 05:26:32.780841 140023758477120 run_docker.py:122] Mounting /data/af_download_data -> /mnt/data_dir I0615 05:26:32.780938 140023758477120 run_docker.py:122] Mounting /data/af_download_data/pdb_mmcif/mmcif_files -> /mnt/template_mmcif_dir I0615 05:26:32.781038 140023758477120 run_docker.py:122] Mounting /data/af_download_data/pdb_mmcif -> /mnt/obsolete_pdbs_path I0615 05:26:32.781960 140023758477120 run_docker.py:122] Mounting /data/af_download_data/pdb70 -> /mnt/pdb70_database_path I0615 05:26:32.782088 140023758477120 run_docker.py:122] Mounting /data/af_download_data/uniref30 -> /mnt/uniref30_database_path I0615 05:26:32.782209 140023758477120 run_docker.py:122] Mounting /data/af_download_data/bfd -> /mnt/bfd_database_path I0615 05:26:33.179164 140023758477120 run_docker.py:264] /bin/bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by /bin/bash) I0615 05:26:36.904992 140023758477120 run_docker.py:264] I0615 05:26:36.904258 140598632284800 templates.py:858] Using precomputed obsolete pdbs /mnt/obsolete_pdbs_path/obsolete.dat. I0615 05:26:36.907782 140023758477120 run_docker.py:264] E0615 05:26:36.907423 140598632284800 jackhmmer.py:75] Could not find Jackhmmer database /mnt/uniref90_database_path/uniref90.fasta I0615 05:26:36.907939 140023758477120 run_docker.py:264] Traceback (most recent call last): I0615 05:26:36.908035 140023758477120 run_docker.py:264] File "/app/alphafold/run_alphafold.py", line 570, in I0615 05:26:36.908140 140023758477120 run_docker.py:264] app.run(main) I0615 05:26:36.908221 140023758477120 run_docker.py:264] File "/opt/conda/lib/python3.11/site-packages/absl/app.py", line 312, in run I0615 05:26:36.908318 140023758477120 run_docker.py:264] _run_main(main, args) I0615 05:26:36.908392 140023758477120 run_docker.py:264] File "/opt/conda/lib/python3.11/site-packages/absl/app.py", line 258, in _run_main I0615 05:26:36.908481 140023758477120 run_docker.py:264] sys.exit(main(argv)) I0615 05:26:36.908572 140023758477120 run_docker.py:264] ^^^^^^^^^^ I0615 05:26:36.908645 140023758477120 run_docker.py:264] File "/app/alphafold/run_alphafold.py", line 486, in main I0615 05:26:36.908759 140023758477120 run_docker.py:264] monomer_data_pipeline = pipeline.DataPipeline( I0615 05:26:36.908886 140023758477120 run_docker.py:264] ^^^^^^^^^^^^^^^^^^^^^^ I0615 05:26:36.909015 140023758477120 run_docker.py:264] File "/app/alphafold/alphafold/data/pipeline.py", line 130, in init I0615 05:26:36.909125 140023758477120 run_docker.py:264] self.jackhmmer_uniref90_runner = jackhmmer.Jackhmmer( I0615 05:26:36.909199 140023758477120 run_docker.py:264] ^^^^^^^^^^^^^^^^^^^^ I0615 05:26:36.909268 140023758477120 run_docker.py:264] File "/app/alphafold/alphafold/data/tools/jackhmmer.py", line 76, in init I0615 05:26:36.909338 140023758477120 run_docker.py:264] raise ValueError(f'Could not find Jackhmmer database {database_path}') I0615 05:26:36.909404 140023758477120 run_docker.py:264] ValueError: Could not find Jackhmmer database /mnt/uniref90_database_path/uniref90.fasta

Can anyone guide me on how to fix this issue?

rosswalker commented 3 months ago

I0615 05:26:36.909404 140023758477120 run_docker.py:264] ValueError: Could not find Jackhmmer database /mnt/>uniref90_database_path/uniref90.fasta

Can anyone guide me on how to fix this issue?

This looks like you are missing the necessary data files. Did you run the scripts/download_all_data.sh script and did it complete successfully?

sude8594 commented 3 months ago

Yes all databases are downloaded. I can see the files in data directory. When I run the prediction command am getting the jackhmmer database couldn't find.

ocstx commented 3 months ago

Hi again just installed AF today exactly as I previously described and I found the same error. Using the modified dockerfile provided by @rosswalker worked (again). Is this issue not going to be solved in the main branch? or there is something more in my side that I fail to see?

0xpsi commented 2 months ago

ive been trying to solve this for the last 4 hours and rosswalker's suggestion worked.

Can't say i know exactly what the issue is, but I will point out that I noticed openmm has a cuda dependency of version <12. Not sure if that is related. I did try reducing the CUDA arg to 11.6.1 and that did not work.

0xpsi commented 2 months ago

WORKED! thank you very much One question rising from my ignorance: in the original file, at the beginning "$CUDA" variable is defined, but in the original line where cuda is installed, the variable $CUDA_VERSION is used, which has not been defined anywhere, but it works, docker executes "conda install -y -c nvidia cuda=12.2.2". Where does this variable come from? when was defined? why use $CUDA_VERSION and not $CUDA ?

The ${CUDA_VERSION} is referencing an environment variable in the image that must have been set by one of the previous commands.

dohyeonscottkim commented 1 month ago

So manually specifying the version of all the various related packages works around the NVIDIA / conda versioning bug and actually allows one to build alphafold2 properly. So until someone actually fixes the issue with the conda solver just edit your dockerfile and change

ENV PATH="/opt/conda/bin:$PATH"
ENV LD_LIBRARY_PATH="/opt/conda/lib:$LD_LIBRARY_PATH"
RUN conda install -qy conda==24.1.2 pip python=3.11 \
    && conda install -y -c nvidia cuda=${CUDA_VERSION} \
    && conda install -y -c conda-forge openmm=8.0.0 pdbfixer \
    && conda clean --all --force-pkgs-dirs --yes

to

# Install conda packages.
ENV PATH="/opt/conda/bin:$PATH"
ENV LD_LIBRARY_PATH="/opt/conda/lib:$LD_LIBRARY_PATH"
RUN conda install -qy conda==24.5.0 pip python=3.11 \
 && conda install -y -c nvidia cuda=12.2.2 cuda-tools=12.2.2 cuda-toolkit=12.2.2 cuda-version=12.2 cuda-command-line-tools=12.2.2 cuda-compiler=12.2.2 cuda-runtime=12.2.2
RUN conda install -y -c conda-forge openmm=8.0.0 pdbfixer \
    && conda clean --all --force-pkgs-dirs --yes

This worked like magic! Thank you!

jung-geun commented 1 month ago

You can resolve the issue by updating the code as follows:

RUN conda install -qy conda==24.5.0 pip python=3.11 \
  && conda install -y -c nvidia/label/cuda-${CUDA} cuda \
  && conda install -y -c conda-forge openmm=8.0.0 pdbfixer \
  && conda clean --all --force-pkgs-dirs --yes

It seems that there is an issue with version tracking in the conda repository. The problem was resolved by explicitly specifying the version.