NVIDIA / libnvidia-container

NVIDIA container runtime library
Apache License 2.0
816 stars 200 forks source link

Error linking when the library version on the host is lower than that in the image #266

Open elezar opened 4 months ago

elezar commented 4 months ago

When the Driver version and library version on the host are lower than the library version included in the image, after ldconfig of libnvidia-container is executed during container creation, the symlinks of the library in the container will be linked to a new version of library in image. This causes corresponding libraries to become unavailable. For example, executing nvidia-smi would result in an error: Failed to initialize NVML: Driver/library version mismatch.

reproduce

Use a host which driver version is lower than 525.105.17.

$ docker pull nsblink/ubuntu:test_nvc
$ docker run --rm -e NVIDIA_VISIBLE_DEVICES=1 -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -it --entrypoint /bin/bash nsblink/ubuntu:test_nvc
$ nvidia-smi
$ cd /lib/x86_64-linux-gnu; ls -lah | grep libnvidia-ml
root@da0fd684b11a:/lib/x86_64-linux-gnu# ls -lah | grep libnvidia-ml
lrwxrwxrwx  1 root root    26 Jul 29 07:59 libnvidia-ml.so.1 -> libnvidia-ml.so.525.105.17
-rw-r--r--  1 root root  1.8M May 12  2022 libnvidia-ml.so.470.129.06
-rw-r--r--  1 root root  1.8M Jul 25 08:32 libnvidia-ml.so.525.105.17

patch

Here I provide a patch !225 to solve this problem by recreating symlinks for libraries related to driver versions after ldconfig execution.

Migrated from https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/issues/3