Open cyberwillis opened 5 years ago
Any update on this issue on how to resolve this? I am currently using the deepstream-nvidia container (nvcr.io/nvidia/deepstream:5.0.1-20.09-triton).
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libcuda.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvcuvid.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvoptix.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.440.33.01 is empty, not checked.
Got this error.
When I am doing nvidia-smi. NVIDIA-SMI 455.23.05 Driver Version: 455.23.05. I got this. I am trying to install tensorrt-7.2.2.3 long with cuda-11.1. Any help would be great. Thanks in advance.
His can happen if the container image itself is built with the nvidia container runtime in use.
Do you see this issue when running the deepops container directly, or only when you try and extend it and build a new image yourself?
I’ll need to check with the deepops team to make sure they build all of their containers with runc set as the runtime and not nvidia-container-runtime.
I am using the below command to build the container "docker run -it --net=host --gpus=all -v path:path nvcr.io/nvidia/deepstream:5.0.1-20.09-triton" I am using other container, nvcr.io/nvidia/tensorrt:20.12-py3 and I am facing such issues with this container. How to solve the above issue?
@klueska
Facing same issue using by extending also,
Setting up libgssapi-krb5-2:amd64 (1.16-2ubuntu0.2) ... Setting up libpq5:amd64 (10.15-0ubuntu0.18.04.1) ... Setting up binutils (2.30-21ubuntu1~18.04.4) ... Setting up libpython3.6:amd64 (3.6.9-1~18.04ubuntu1.3) ... Setting up python3.6 (3.6.9-1~18.04ubuntu1.3) ... Setting up libcurl3-gnutls:amd64 (7.58.0-2ubuntu3.12) ... Setting up libssh-gcrypt-4:amd64 (0.8.0~20170825.94fa1e38-1ubuntu0.7) ... Setting up libcurl4:amd64 (7.58.0-2ubuntu3.12) ... Setting up libcurl4-gnutls-dev:amd64 (7.58.0-2ubuntu3.12) ... Setting up gdb (8.1.1-0ubuntu1) ... Setting up curl (7.58.0-2ubuntu3.12) ... Processing triggers for libc-bin (2.27-3ubuntu1.4) ... /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libcuda.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvcuvid.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvoptix.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.440.33.01 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.440.33.01 is empty, not checked. Processing triggers for mime-support (3.60ubuntu1) ... Processing triggers for libgdk-pixbuf2.0-0:amd64 (2.36.11-2) ... Processing triggers for ca-certificates (20201027ubuntu0.18.04.1) ... Updating certificates in /etc/ssl/certs... 0 added, 0 removed; done. Running hooks in /etc/ca-certificates/update.d... done.
My dockerfile:
`FROM nvcr.io/nvidia/deepstream:5.0.1-20.09-triton
RUN apt-get update
RUN DEBIAN_FRONTEND="noninteractive" apt-get -y install tzdata RUN ln -fs /usr/share/zoneinfo/Asia/Kolkata /etc/timezone RUN dpkg-reconfigure -f noninteractive tzdata
RUN apt-get install -y python3-pip RUN apt-get install -y libgl1-mesa-glx RUN apt-get install -y libglib2.0-0 RUN apt-get install -y git vim RUN apt-get install -y libsm6 libxext6 RUN apt-get install -y libxrender-dev
RUN pip3 install --upgrade pip
How should I proceed? Any help is appreciated. TIY
Hello Everybody, I don't use docker but in LXC I do just one thing to solve this. When I update the drivers in host computer I run a script that start the container without the NVIDIA runtime enabled then I delete everything related to the Previous NVIDIA DRIVER then I enable the NVIDIA runtime again. In this way the moment the container starts the new drivers will be inserted inside the container (I mean the new references to the drivers).
That is not the case for me. I have 455 installed on my server, but when I am running the docker container, I am facing this issue. When I do nvidia-smi, I can see that drivers are 455. | NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 On | 00000000:12:00.0 Off | 0 | | N/A 66C P0 31W / 70W | 254MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla T4 On | 00000000:13:00.0 Off | 0 | | N/A 68C P0 30W / 70W | 1448MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 Tesla T4 On | 00000000:37:00.0 Off | 0 | | N/A 75C P0 33W / 70W | 2337MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 Tesla T4 On | 00000000:AF:00.0 Off | 0 | | N/A 60C P0 30W / 70W | 7005MiB / 15109MiB | 0% Default | | | | N/A | So I tried ignoring them and went forward to install other libraries(tensorrt 7.1 and cuda 11.1), but facing so many issues. See this issue I posted on nvidia-developer-forum I was not able to completely install cuda-11.1 from deb file. It is saying cuda-11.1 is not yet configured due to the broken packages. The root of all these errors is the above issue.
As I mentioned before, if you are trying to extend the container image (either directly with docker build or by running it, extending it, and saving it), you need to make sure you run it with normal runc - not the nvidia-container-runtime. This may require you to change your daemon.json to make runc the default runtime during builds if you have it set to nvidia otherwise.
If you don’t do this, then „ghost“ versions of the nvidia libraries (with 0 bytes) from what were injected during the build will hang around inside the container image after the build and cause problems later on.
This happens because of the way libnvida-container mounts these library files behind the back of docker, so docker doesn’t know they are bind mounted volumes and doesn’t clean them up properly when shutting the container down after building the the image.
It’s a limitation of the design of libnvidia-container and not something easily fixed without a rearchitecture of the Nvidia container stack (which we are in the process of doing now).
On LXC, I’m not sure exactly what is necessary to „disable“ the nvidia stack from being used during image builds, but the principle is the same.
after ldconfig /usr/local/cuda-10.1/lib64 I got the following error information:
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libcuda.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.418.56 is empty, not checked.
Do you know, what is wrong? Thanks!
Same issue:
ldconfig
# /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.510.73.05 is empty, not checked.
# /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.510.73.05 is empty, not checked.
# /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.510.73.05 is empty, not checked.
# /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libcuda.so.510.73.05 is empty, not checked.
# /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.510.73.05 is empty, not checked.
cat /proc/driver/nvidia/version # host driver version
# NVRM version: NVIDIA UNIX x86_64 Kernel Module 510.85.02 Tue Jul 12 16:51:23 UTC 2022
# GCC version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
ls /usr/lib/x86_64-linux-gnu/libcuda.so.* # old and new versions in docker path
# /usr/lib/x86_64-linux-gnu/libcuda.so.1 /usr/lib/x86_64-linux-gnu/libcuda.so.510.73.05 /usr/lib/x86_64-linux-gnu/libcuda.so.440.118.02 /usr/lib/x86_64-linux-gnu/libcuda.so.510.85.02
Actually the symbolic link is correctly pointed to the host library, e.g.,
ldconfig -v | grep libcuda.so
# libcuda.so.1 -> libcuda.so.510.85.02
So, is it okay to solve this in the ENTRYPOINT
? e.g.
ldconfig 2>errlog && cat errlog | awk '{print $3}' | xargs rm && rm -f errlog # rm error/empty versions
It's not a error but is something that I get to see some times. I am using LXD/LXC containers and the last time I first launched the container, my host had the driver version nvidia-418.56. After some time I had to downgrade my host nvidia drivers to an earlier version nvidia-410.104, and after build some other software inside the container the execution of ldconfig dumped the following message:
So I checked to see if there was some files left behind and I found the following (References from older drivers with zero bytes.).
Could be a way to make those old references go away automatically without need to exclude each one by hand ?