When the Driver version and library version on the host are lower than the library version included in the image, after ldconfig of libnvidia-container is executed during container creation, the symlinks of the library in the container will be linked to a new version of library in image. This causes corresponding libraries to become unavailable. For example, executing nvidia-smi would result in an error: Failed to initialize NVML: Driver/library version mismatch.
reproduce
Use a host which driver version is lower than 525.105.17.
$ docker pull nsblink/ubuntu:test_nvc
$ docker run --rm -e NVIDIA_VISIBLE_DEVICES=1 -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -it --entrypoint /bin/bash nsblink/ubuntu:test_nvc
$ nvidia-smi
$ cd /lib/x86_64-linux-gnu; ls -lah | grep libnvidia-ml
When the Driver version and library version on the host are lower than the library version included in the image, after ldconfig of libnvidia-container is executed during container creation, the symlinks of the library in the container will be linked to a new version of library in image. This causes corresponding libraries to become unavailable. For example, executing
nvidia-smi
would result in an error:Failed to initialize NVML: Driver/library version mismatch
.reproduce
Use a host which driver version is lower than 525.105.17.
patch
Here I provide a patch !225 to solve this problem by recreating symlinks for libraries related to driver versions after ldconfig execution.
Migrated from https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/issues/3