NVIDIA / libnvidia-container

NVIDIA container runtime library
Apache License 2.0
821 stars 201 forks source link

after "ldconfig /usr/local/cuda/lib64" I got the error information #140

Open Austinzhenghua opened 3 years ago

Austinzhenghua commented 3 years ago

/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libcuda.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.418.56 is empty, not checked.

Does anyone know, what is wrong? Thanks!

SmartMapple commented 2 years ago

i got the same issue.

elezar commented 2 years ago

The issue is that the container image was built using the nvidia-conainter-runtime instead of runc. This causes the NVIDIA Container CLI to mount these files into the container frokm the host and these are then left as zero-byte files.

Could you confirm which image this is?

SmartMapple commented 2 years ago

The issue is that the container image was built using the nvidia-conainter-runtime instead of runc. This causes the NVIDIA Container CLI to mount these files into the container frokm the host and these are then left as zero-byte files.

Could you confirm which image this is?

i use nvidia-container-toolkit instead of the nvidia-conainter-runtime, but i think maybe is actually caused by the problem you mentioned. i can canfirm the image which i use. how can i solved this problem? thanks for you help.

elezar commented 2 years ago

If you can rebuild the image, you should be able to rebuild without the NVIDIA container runtime (also installed as part of the nvidia-container-toolkit package) and have this issue resolved.

If rebuilding the image is not possible, remove the /usr/lib/x86_64-linux-gnu/*.so.418.56 files from the image and repush / retag it.

SmartMapple commented 2 years ago

If you can rebuild the image, you should be able to rebuild without the NVIDIA container runtime (also installed as part of the nvidia-container-toolkit package) and have this issue resolved.

If rebuilding the image is not possible, remove the /usr/lib/x86_64-linux-gnu/*.so.418.56 files from the image and repush / retag it.

thanks. let me try.

Davidrjx commented 1 year ago

If you can rebuild the image, you should be able to rebuild without the NVIDIA container runtime (also installed as part of the nvidia-container-toolkit package) and have this issue resolved.

If rebuilding the image is not possible, remove the /usr/lib/x86_64-linux-gnu/*.so.418.56 files from the image and repush / retag it.

@elezar why? i saw that host with the container has libnvidia-ml.so but refer to libnvidia-ml.so.\<nv driver version> as follows

lrwxrwxrwx 1 root root      17 Feb 18 15:53 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so -> libnvidia-ml.so.1
lrwxrwxrwx 1 root root      25 Feb 18 15:53 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 -> libnvidia-ml.so.525.78.01
-rwxr-xr-x 1 root root 1798712 Feb 18 15:53 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.525.78.01

while container can not find matched libnvidia-ml.so, error like

NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

and mount point in container shows

/dev/sda1 on /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libcuda.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libcudadebugger.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)