aws-samples / alphafold-protein-structure-prediction-with-frontend-app

Other
15 stars 5 forks source link

increase the root volume of the compute nodes in ParallelCluster #34

Open tskg-yoshiaki-nakamura opened 1 month ago

tskg-yoshiaki-nakamura commented 1 month ago

When running the AlphaFold Docker file, the node fails to start due to insufficient capacity of the root volume on the compute node. To avoid this, you need to specify ComputeSettings in the compute node section of config_template.yml. While it has not been confirmed that 50GiB is the minimum capacity required for operation, it allowed the Docker file to run successfully in my environment.

    - Name: queue-gpu
      # add 4 rows
      ComputeSettings:
        LocalStorage:
          RootVolume:
            Size: 50
tskg-yoshiaki-nakamura commented 1 month ago

When running the Dockerfile, an error occurs at the following section:

#9 [ 5/12] RUN conda install -qy conda==24.1.2 pip python=3.11     && conda install -y -c nvidia cuda=12.2.2     && conda install -y -c conda-forge openmm=8.0.0 pdbfixer     && conda clean --all --force-pkgs-dirs --yes

~some logs~

#9 251.8 InvalidArchiveError('Error with archive /opt/conda/pkgs/cudatoolkit-11.8.0-h4ba93d1_13.conda.  You probably need to delete and re-download or re-create this file.  Message was:\n\nfailed with error: [Errno 28] No space left on device')

After editing the Dockerfile and outputting disk information during package installation, the following information is obtained:

# RUN conda install -qy conda==24.1.2 pip python=3.11 \
#     && conda install -y -c nvidia cuda=${CUDA_VERSION} \
#     && conda install -y -c conda-forge openmm=8.0.0 pdbfixer \
#     && conda clean --all --force-pkgs-dirs --yes
RUN conda install -qy conda==24.1.2 pip python=3.11
RUN conda install -y -c nvidia cuda=${CUDA_VERSION}
RUN df -h
RUN df -h /opt/conda/pkgs
RUN conda install -y -c conda-forge openmm=8.0.0
RUN conda install -y -c conda-forge pdbfixer
RUN conda clean --all --force-pkgs-dirs --yes
#11 [ 7/19] RUN df -h
#11 0.238 /bin/bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by /bin/bash)
#11 0.240 Filesystem      Size  Used Avail Use% Mounted on
#11 0.240 overlay          39G   37G  1.6G  96% /
#11 0.240 tmpfs            64M     0   64M   0% /dev
#11 0.240 shm              64M     0   64M   0% /dev/shm
#11 0.240 /dev/root        39G   37G  1.6G  96% /etc/resolv.conf
#11 0.240 tmpfs            16G     0   16G   0% /sys/fs/cgroup
#11 0.240 tmpfs            16G     0   16G   0% /proc/acpi
#11 0.240 tmpfs            16G     0   16G   0% /sys/firmware
#11 0.240 tmpfs            16G     0   16G   0% /proc/scsi
#11 DONE 0.3s

#12 [ 8/19] RUN df -h /opt/conda/pkgs
#12 0.211 /bin/bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by /bin/bash)
#12 0.213 Filesystem      Size  Used Avail Use% Mounted on
#12 0.213 overlay          39G   37G  1.6G  96% /
#12 DONE 0.3s