NVIDIA / nvidia-container-toolkit

Build and run containers leveraging NVIDIA GPUs
Apache License 2.0
2.37k stars 255 forks source link

The corresponding relationship between NVIDIA drivers and Container Toolkit versions #195

Open MuShouler opened 10 months ago

MuShouler commented 10 months ago

I have a machine with an RTXA4000 with a NVIDIA graphics driver of 535.129.03. The host machine operating system is centos7.9. Docker version 20.10.4 and container-toolkit version 1.11.0 are installed. At this time, the container is started by --gpus=all. nVidia-smi has output in the container, but opencl does not work properly. There is an error in the Preferred work group size multiple item in the clinfo prompt information. When I upgraded container-toolkit to 1.13.5 and then started the container in the same way, opencl in the container could work normally and clinfo did not have any error prompts. I searched the release-notes of container-toolkit and the only information related to the NVIDIA graphics driver version is Added support for detecting and injecting multiple GSP firmware files as required by the 525.x versions of the NVIDIA GPU drivers. However, I confirmed that the graphics card on my host machine did not enable the GSP function. Does the NVIDIA Container Toolkit have any specific requirements for the NVIDIA driver, or is the NVIDIA Container Toolkit 1.11.0 no longer suitable for NVIDIA drivers after 525.x?

### Tasks
elezar commented 9 months ago

If I recall correctly, the NVVM Compiler Library is required with newer driver versions. This was added in v1.12.0 (see https://github.com/NVIDIA/nvidia-container-toolkit/releases/tag/v1.12.0).

For these kinds of driver changes, a newer NVIDIA Container Toolkit version is recommended.