Open Austinzhenghua opened 3 years ago
i got the same issue.
The issue is that the container image was built using the nvidia-conainter-runtime instead of runc. This causes the NVIDIA Container CLI to mount these files into the container frokm the host and these are then left as zero-byte files.
Could you confirm which image this is?
The issue is that the container image was built using the nvidia-conainter-runtime instead of runc. This causes the NVIDIA Container CLI to mount these files into the container frokm the host and these are then left as zero-byte files.
Could you confirm which image this is?
i use nvidia-container-toolkit instead of the nvidia-conainter-runtime, but i think maybe is actually caused by the problem you mentioned. i can canfirm the image which i use. how can i solved this problem? thanks for you help.
If you can rebuild the image, you should be able to rebuild without the NVIDIA container runtime (also installed as part of the nvidia-container-toolkit
package) and have this issue resolved.
If rebuilding the image is not possible, remove the /usr/lib/x86_64-linux-gnu/*.so.418.56
files from the image and repush / retag it.
If you can rebuild the image, you should be able to rebuild without the NVIDIA container runtime (also installed as part of the
nvidia-container-toolkit
package) and have this issue resolved.If rebuilding the image is not possible, remove the
/usr/lib/x86_64-linux-gnu/*.so.418.56
files from the image and repush / retag it.
thanks. let me try.
If you can rebuild the image, you should be able to rebuild without the NVIDIA container runtime (also installed as part of the
nvidia-container-toolkit
package) and have this issue resolved.If rebuilding the image is not possible, remove the
/usr/lib/x86_64-linux-gnu/*.so.418.56
files from the image and repush / retag it.
@elezar why? i saw that host with the container has libnvidia-ml.so but refer to libnvidia-ml.so.\<nv driver version> as follows
lrwxrwxrwx 1 root root 17 Feb 18 15:53 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so -> libnvidia-ml.so.1
lrwxrwxrwx 1 root root 25 Feb 18 15:53 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 -> libnvidia-ml.so.525.78.01
-rwxr-xr-x 1 root root 1798712 Feb 18 15:53 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.525.78.01
while container can not find matched libnvidia-ml.so, error like
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.
and mount point in container shows
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libcuda.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libcudadebugger.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/dev/sda1 on /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.525.78.01 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,stripe=64)
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libcuda.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.418.56 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.418.56 is empty, not checked.
Does anyone know, what is wrong? Thanks!