devcontainers / features

A collection of Dev Container Features managed by Dev Container spec maintainers. See https://github.com/devcontainers/feature-starter to publish your own
https://containers.dev/features
MIT License
915 stars 368 forks source link

Cuda 11.7 fails to install #484

Open MTRNord opened 1 year ago

MTRNord commented 1 year ago

Hi I am running this devcontainer json:

{
    "customizations": {
        "vscode": {
            "extensions": [
                "ms-python.python",
                "ms-toolsai.jupyter",
                "ms-python.isort"
            ]
        }
    },
    "features": {
        "ghcr.io/devcontainers/features/nvidia-cuda:1": {
            "installCudnn": true,
            "cudaVersion": "11.7"
        }
    }
}

Doing that sends the container into recovery mode.

The error log it gave me is:

2023-03-10 14:17:55.692Z:  > [dev_containers_target_stage 4/4] RUN cd /tmp/build-features/nvidia-cuda_1 && chmod +x ./devcontainer-features-install.sh && ./devcontainer-features-install.sh:
#16 68.98 Setting up libcurand-11-7 (10.2.10.91-1) ...

#16 69.08 Setting up libcufile-11-7 (1.3.1.18-1) ...

#16 69.18 Setting alternatives

#16 69.19 update-alternatives: using /usr/local/cuda-11.7/gds/cufile.json to provide /etc/cufile.json (cufile.json) in auto mode

#16 69.28 Setting up cuda-libraries-11-7 (11.7.1-1) ...

#16 69.38 Processing triggers for libc-bin (2.31-0ubuntu9.9) ...

#16 76.39 E: Version '8.6.0.163-1+cuda11.7' for 'libcudnn8' was not found
#16 76.39 E: No packages found
#16 76.40 The requested version of cuDNN is not available: cuDNN 8.6.0.163 for CUDA 11.7
#16 76.40 ERROR: Feature "NVIDIA CUDA" (ghcr.io/devcontainers/features/nvidia-cuda) failed to install! Look at the documentation at https://github.com/devcontainers/features/tree/main/src/nvidia-cuda for help troubleshooting this error.

It seems to default to the wrong cudnn version and I would expect it to pick a compatible one for cuda.

ucovcoder commented 1 year ago

I manually looked at the available libs on apt and found libcudnn8_8.5.0.96-1+cuda11.7_amd64.deb meaning that 8.5.0.96 works with 11.7.

"ghcr.io/devcontainers/features/nvidia-cuda:1": {
  "installCudnn": true,
  "cudaVersion": "11.7",
  "cudnnVersion": "8.5.0.96"
}

But I totally agree the expectation is for it to choose wisely. Also, seems there are much more recent cuDNN versions for 11.7: https://developer.nvidia.com/rdp/cudnn-archive