NVIDIA / libnvidia-container

NVIDIA container runtime library
Apache License 2.0
841 stars 205 forks source link

[LowPriority] References to old GPU driver is kept (with zero bytes) after upgrade or downgrade the drivers on host. #50

Open cyberwillis opened 5 years ago

cyberwillis commented 5 years ago

It's not a error but is something that I get to see some times. I am using LXD/LXC containers and the last time I first launched the container, my host had the driver version nvidia-418.56. After some time I had to downgrade my host nvidia drivers to an earlier version nvidia-410.104, and after build some other software inside the container the execution of ldconfig dumped the following message:

/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libvdpau_nvidia.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libGLX_indirect.so.0 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libcuda.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvcuvid.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.1 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.418.56 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.418.56 is empty, not checked.

So I checked to see if there was some files left behind and I found the following (References from older drivers with zero bytes.).

lrwxrwxrwx  1 root   root           24 Apr 14 21:41 libEGL_nvidia.so.0 -> libEGL_nvidia.so.410.104
-rw-r--r--  1 nobody nogroup   1031584 Feb  6 04:55 libEGL_nvidia.so.410.104
-rw-r--r--  1 root   root            0 Apr  1 09:17 libEGL_nvidia.so.418.56
lrwxrwxrwx  1 root   root           30 Apr 14 21:41 libGLESv1_CM_nvidia.so.1 -> libGLESv1_CM_nvidia.so.410.104
-rw-r--r--  1 nobody nogroup     60200 Feb  6 04:54 libGLESv1_CM_nvidia.so.410.104
-rw-r--r--  1 root   root            0 Apr  1 09:17 libGLESv1_CM_nvidia.so.418.56
lrwxrwxrwx  1 root   root           27 Apr 14 21:41 libGLESv2_nvidia.so.2 -> libGLESv2_nvidia.so.410.104
-rw-r--r--  1 nobody nogroup    111400 Feb  6 04:54 libGLESv2_nvidia.so.410.104
-rw-r--r--  1 root   root            0 Apr  1 09:17 libGLESv2_nvidia.so.418.56
lrwxrwxrwx  1 root   root           23 Apr  1 09:17 libGLX_indirect.so.0 -> libGLX_nvidia.so.418.56
lrwxrwxrwx  1 root   root           24 Apr 14 21:41 libGLX_nvidia.so.0 -> libGLX_nvidia.so.410.104
-rw-r--r--  1 nobody nogroup   1274704 Feb  6 04:56 libGLX_nvidia.so.410.104
-rw-r--r--  1 root   root            0 Apr  1 09:17 libGLX_nvidia.so.418.56
lrwxrwxrwx  1 root   root           24 Apr 14 21:41 libnvidia-cfg.so.1 -> libnvidia-cfg.so.410.104
-rw-r--r--  1 nobody nogroup    179592 Feb  6 04:54 libnvidia-cfg.so.410.104
-rw-r--r--  1 root   root            0 Apr  1 09:17 libnvidia-cfg.so.418.56
-rw-r--r--  1 nobody nogroup  47842480 Feb  6 05:14 libnvidia-compiler.so.410.104
-rw-r--r--  1 root   root            0 Apr  1 09:17 libnvidia-compiler.so.418.56
-rw-r--r--  1 nobody nogroup  25283584 Feb  6 05:12 libnvidia-eglcore.so.410.104
-rw-r--r--  1 root   root            0 Apr  1 09:17 libnvidia-eglcore.so.418.56
lrwxrwxrwx  1 root   root           27 Apr 14 21:41 libnvidia-encode.so.1 -> libnvidia-encode.so.410.104
-rw-r--r--  1 nobody nogroup    168184 Feb  6 04:54 libnvidia-encode.so.410.104
-rw-r--r--  1 root   root            0 Apr  1 09:17 libnvidia-encode.so.418.56
-rw-r--r--  1 nobody nogroup    292840 Feb  6 04:55 libnvidia-fatbinaryloader.so.410.104
-rw-r--r--  1 root   root            0 Apr  1 09:17 libnvidia-fatbinaryloader.so.418.56
lrwxrwxrwx  1 root   root           24 Apr 14 21:41 libnvidia-fbc.so.1 -> libnvidia-fbc.so.410.104
-rw-r--r--  1 nobody nogroup    123112 Feb  6 04:54 libnvidia-fbc.so.410.104
-rw-r--r--  1 root   root            0 Apr  1 09:17 libnvidia-fbc.so.418.56
-rw-r--r--  1 nobody nogroup  27088008 Feb  6 05:12 libnvidia-glcore.so.410.104
-rw-r--r--  1 root   root            0 Apr  1 09:17 libnvidia-glcore.so.418.56
-rw-r--r--  1 nobody nogroup    578872 Feb  6 04:55 libnvidia-glsi.so.410.104
-rw-r--r--  1 root   root            0 Apr  1 09:17 libnvidia-glsi.so.418.56
lrwxrwxrwx  1 root   root           24 Apr 14 21:41 libnvidia-ifr.so.1 -> libnvidia-ifr.so.410.104
-rw-r--r--  1 nobody nogroup    206888 Feb  6 04:54 libnvidia-ifr.so.410.104
-rw-r--r--  1 root   root            0 Apr  1 09:17 libnvidia-ifr.so.418.56
lrwxrwxrwx  1 root   root           23 Apr 14 21:41 libnvidia-ml.so.1 -> libnvidia-ml.so.410.104
-rw-r--r--  1 nobody nogroup   1528376 Feb  6 04:58 libnvidia-ml.so.410.104
-rw-r--r--  1 root   root            0 Apr  1 09:17 libnvidia-ml.so.418.56
lrwxrwxrwx  1 root   root           27 Apr 14 21:41 libnvidia-opencl.so.1 -> libnvidia-opencl.so.410.104
-rw-r--r--  1 nobody nogroup  28467576 Feb  6 05:12 libnvidia-opencl.so.410.104
-rw-r--r--  1 root   root            0 Apr  1 09:17 libnvidia-opencl.so.418.56
lrwxrwxrwx  1 root   root           31 Apr  1 09:17 libnvidia-opticalflow.so.1 -> libnvidia-opticalflow.so.418.56
-rw-r--r--  1 root   root            0 Apr  1 09:17 libnvidia-opticalflow.so.418.56
lrwxrwxrwx  1 root   root           35 Apr 14 21:41 libnvidia-ptxjitcompiler.so.1 -> libnvidia-ptxjitcompiler.so.410.104
-rw-r--r--  1 nobody nogroup  12129448 Feb  6 05:01 libnvidia-ptxjitcompiler.so.410.104
-rw-r--r--  1 root   root            0 Apr  1 09:17 libnvidia-ptxjitcompiler.so.418.56
-rw-r--r--  1 nobody nogroup     14480 Feb  6 04:54 libnvidia-tls.so.410.104
-rw-r--r--  1 root   root            0 Apr  1 09:17 libnvidia-tls.so.418.56
lrwxrwxrwx  1 root   root           26 Apr 14 21:41 libvdpau_nvidia.so.1 -> libvdpau_nvidia.so.410.104
-rw-r--r--  1 nobody nogroup    991552 Feb  6 04:55 libvdpau_nvidia.so.410.104
-rw-r--r--  1 root   root            0 Apr  1 09:17 libvdpau_nvidia.so.418.56

Could be a way to make those old references go away automatically without need to exclude each one by hand ?

bobbilichandu commented 3 years ago

Any update on this issue on how to resolve this? I am currently using the deepstream-nvidia container (nvcr.io/nvidia/deepstream:5.0.1-20.09-triton).

/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libcuda.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvcuvid.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvoptix.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.440.33.01 is empty, not checked.

Got this error.

When I am doing nvidia-smi. NVIDIA-SMI 455.23.05 Driver Version: 455.23.05. I got this. I am trying to install tensorrt-7.2.2.3 long with cuda-11.1. Any help would be great. Thanks in advance.

klueska commented 3 years ago

His can happen if the container image itself is built with the nvidia container runtime in use.

Do you see this issue when running the deepops container directly, or only when you try and extend it and build a new image yourself?

I’ll need to check with the deepops team to make sure they build all of their containers with runc set as the runtime and not nvidia-container-runtime.

bobbilichandu commented 3 years ago

I am using the below command to build the container "docker run -it --net=host --gpus=all -v path:path nvcr.io/nvidia/deepstream:5.0.1-20.09-triton" I am using other container, nvcr.io/nvidia/tensorrt:20.12-py3 and I am facing such issues with this container. How to solve the above issue?

bobbilichandu commented 3 years ago

@klueska Facing same issue using by extending also, Setting up libgssapi-krb5-2:amd64 (1.16-2ubuntu0.2) ... Setting up libpq5:amd64 (10.15-0ubuntu0.18.04.1) ... Setting up binutils (2.30-21ubuntu1~18.04.4) ... Setting up libpython3.6:amd64 (3.6.9-1~18.04ubuntu1.3) ... Setting up python3.6 (3.6.9-1~18.04ubuntu1.3) ... Setting up libcurl3-gnutls:amd64 (7.58.0-2ubuntu3.12) ... Setting up libssh-gcrypt-4:amd64 (0.8.0~20170825.94fa1e38-1ubuntu0.7) ... Setting up libcurl4:amd64 (7.58.0-2ubuntu3.12) ... Setting up libcurl4-gnutls-dev:amd64 (7.58.0-2ubuntu3.12) ... Setting up gdb (8.1.1-0ubuntu1) ... Setting up curl (7.58.0-2ubuntu3.12) ... Processing triggers for libc-bin (2.27-3ubuntu1.4) ... /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libcuda.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvcuvid.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvoptix.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.440.33.01 is empty, not checked. Processing triggers for mime-support (3.60ubuntu1) ... Processing triggers for libgdk-pixbuf2.0-0:amd64 (2.36.11-2) ... Processing triggers for ca-certificates (20201027ubuntu0.18.04.1) ... Updating certificates in /etc/ssl/certs... 0 added, 0 removed; done. Running hooks in /etc/ca-certificates/update.d... done. My dockerfile: `FROM nvcr.io/nvidia/deepstream:5.0.1-20.09-triton

RUN apt-get update

RUN DEBIAN_FRONTEND="noninteractive" apt-get -y install tzdata RUN ln -fs /usr/share/zoneinfo/Asia/Kolkata /etc/timezone RUN dpkg-reconfigure -f noninteractive tzdata

RUN apt-get install -y python3-pip RUN apt-get install -y libgl1-mesa-glx RUN apt-get install -y libglib2.0-0 RUN apt-get install -y git vim RUN apt-get install -y libsm6 libxext6 RUN apt-get install -y libxrender-dev

RUN pip3 install --upgrade pip

RUN pip install scikit-build torch==1.4.0 torchvision==0.5.0 six pycocotools terminaltables opencv-python==4.2.0.32'

How should I proceed? Any help is appreciated. TIY

cyberwillis commented 3 years ago

Hello Everybody, I don't use docker but in LXC I do just one thing to solve this. When I update the drivers in host computer I run a script that start the container without the NVIDIA runtime enabled then I delete everything related to the Previous NVIDIA DRIVER then I enable the NVIDIA runtime again. In this way the moment the container starts the new drivers will be inserted inside the container (I mean the new references to the drivers).

bobbilichandu commented 3 years ago

That is not the case for me. I have 455 installed on my server, but when I am running the docker container, I am facing this issue. When I do nvidia-smi, I can see that drivers are 455. | NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 On | 00000000:12:00.0 Off | 0 | | N/A 66C P0 31W / 70W | 254MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla T4 On | 00000000:13:00.0 Off | 0 | | N/A 68C P0 30W / 70W | 1448MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 Tesla T4 On | 00000000:37:00.0 Off | 0 | | N/A 75C P0 33W / 70W | 2337MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 Tesla T4 On | 00000000:AF:00.0 Off | 0 | | N/A 60C P0 30W / 70W | 7005MiB / 15109MiB | 0% Default | | | | N/A | So I tried ignoring them and went forward to install other libraries(tensorrt 7.1 and cuda 11.1), but facing so many issues. See this issue I posted on nvidia-developer-forum I was not able to completely install cuda-11.1 from deb file. It is saying cuda-11.1 is not yet configured due to the broken packages. The root of all these errors is the above issue.

klueska commented 3 years ago

As I mentioned before, if you are trying to extend the container image (either directly with docker build or by running it, extending it, and saving it), you need to make sure you run it with normal runc - not the nvidia-container-runtime. This may require you to change your daemon.json to make runc the default runtime during builds if you have it set to nvidia otherwise.

If you don’t do this, then „ghost“ versions of the nvidia libraries (with 0 bytes) from what were injected during the build will hang around inside the container image after the build and cause problems later on.

This happens because of the way libnvida-container mounts these library files behind the back of docker, so docker doesn’t know they are bind mounted volumes and doesn’t clean them up properly when shutting the container down after building the the image.

It’s a limitation of the design of libnvidia-container and not something easily fixed without a rearchitecture of the Nvidia container stack (which we are in the process of doing now).

klueska commented 3 years ago

On LXC, I’m not sure exactly what is necessary to „disable“ the nvidia stack from being used during image builds, but the principle is the same.

Austinzhenghua commented 3 years ago

after ldconfig /usr/local/cuda-10.1/lib64 I got the following error information:

/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libcuda.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.418.56 is empty, not checked.

Do you know, what is wrong? Thanks!

JinchaoLove commented 2 years ago

Same issue:

ldconfig
# /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.510.73.05 is empty, not checked.
# /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.510.73.05 is empty, not checked.
# /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.510.73.05 is empty, not checked.
# /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libcuda.so.510.73.05 is empty, not checked.
# /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.510.73.05 is empty, not checked.
cat /proc/driver/nvidia/version   # host driver version
# NVRM version: NVIDIA UNIX x86_64 Kernel Module  510.85.02  Tue Jul 12 16:51:23 UTC 2022
# GCC version:  gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
ls /usr/lib/x86_64-linux-gnu/libcuda.so.*  # old and new versions in docker path
# /usr/lib/x86_64-linux-gnu/libcuda.so.1  /usr/lib/x86_64-linux-gnu/libcuda.so.510.73.05  /usr/lib/x86_64-linux-gnu/libcuda.so.440.118.02  /usr/lib/x86_64-linux-gnu/libcuda.so.510.85.02

Actually the symbolic link is correctly pointed to the host library, e.g.,

ldconfig -v | grep libcuda.so
# libcuda.so.1 -> libcuda.so.510.85.02

So, is it okay to solve this in the ENTRYPOINT? e.g.

ldconfig 2>errlog && cat errlog | awk '{print $3}' | xargs rm && rm -f errlog  # rm error/empty versions