Closed klmmr closed 11 months ago
I am having the same errors when run
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all --rm nvidia/cuda nvidia-smi
It complains the following
nvidia-container-cli: mount error: write error: /sys/fs/cgroup/devices/docker/7f736858f0eb8fec8cce9b2a7dffc7646a58d65730482c25639cafa746350732/devices.allow: operation not permitted\\\\n\\\"\"": unknown.
It is in an unprivileged LXC container. Any thoughts?
nvidia-smi runs just fine, nvidia-container-cli -k -d /dev/tty info
produces the following output:
gpu 12|12:19 [~] nvidia-container-cli -k -d /dev/tty info
-- WARNING, the following logs are for debugging purposes only --
I1212 17:21:26.532332 1025 nvc.c:281] initializing library context (version=1.0.5, build=13b836390888f7b7c7dca115d16d7e28ab15a836)
I1212 17:21:26.532366 1025 nvc.c:255] using root /
I1212 17:21:26.532369 1025 nvc.c:256] using ldcache /etc/ld.so.cache
I1212 17:21:26.532372 1025 nvc.c:257] using unprivileged user 65534:65534
W1212 17:21:26.532392 1025 nvc.c:166] skipping kernel modules load due to user namespace
I1212 17:21:26.532504 1026 driver.c:133] starting driver service
I1212 17:21:26.995525 1025 nvc_info.c:437] requesting driver information with ''
I1212 17:21:26.995634 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/vdpau/libvdpau_nvidia.so.440.36
I1212 17:21:26.995671 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.440.36
I1212 17:21:26.995694 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.440.36
I1212 17:21:26.995711 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.440.36
I1212 17:21:26.995729 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.440.36
I1212 17:21:26.995746 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.440.36
I1212 17:21:26.995761 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.440.36
I1212 17:21:26.995778 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.440.36
I1212 17:21:26.995794 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.440.36
I1212 17:21:26.995810 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.440.36
I1212 17:21:26.995826 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.440.36
I1212 17:21:26.995843 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.440.36
I1212 17:21:26.995860 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.440.36
I1212 17:21:26.995874 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.440.36
I1212 17:21:26.995890 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.440.36
I1212 17:21:26.995905 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.440.36
I1212 17:21:26.995921 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.440.36
I1212 17:21:26.995937 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.440.36
I1212 17:21:26.995953 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.440.36
I1212 17:21:26.995994 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.440.36
I1212 17:21:26.996021 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.440.36
I1212 17:21:26.996037 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.440.36
I1212 17:21:26.996053 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.440.36
I1212 17:21:26.996069 1025 nvc_info.c:151] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.440.36
W1212 17:21:26.996077 1025 nvc_info.c:306] missing compat32 library libnvidia-ml.so
W1212 17:21:26.996079 1025 nvc_info.c:306] missing compat32 library libnvidia-cfg.so
W1212 17:21:26.996085 1025 nvc_info.c:306] missing compat32 library libcuda.so
W1212 17:21:26.996088 1025 nvc_info.c:306] missing compat32 library libnvidia-opencl.so
W1212 17:21:26.996091 1025 nvc_info.c:306] missing compat32 library libnvidia-ptxjitcompiler.so
W1212 17:21:26.996095 1025 nvc_info.c:306] missing compat32 library libnvidia-fatbinaryloader.so
W1212 17:21:26.996099 1025 nvc_info.c:306] missing compat32 library libnvidia-compiler.so
W1212 17:21:26.996102 1025 nvc_info.c:306] missing compat32 library libvdpau_nvidia.so
W1212 17:21:26.996104 1025 nvc_info.c:306] missing compat32 library libnvidia-encode.so
W1212 17:21:26.996107 1025 nvc_info.c:306] missing compat32 library libnvidia-opticalflow.so
W1212 17:21:26.996111 1025 nvc_info.c:306] missing compat32 library libnvcuvid.so
W1212 17:21:26.996114 1025 nvc_info.c:306] missing compat32 library libnvidia-eglcore.so
W1212 17:21:26.996117 1025 nvc_info.c:306] missing compat32 library libnvidia-glcore.so
W1212 17:21:26.996120 1025 nvc_info.c:306] missing compat32 library libnvidia-tls.so
W1212 17:21:26.996122 1025 nvc_info.c:306] missing compat32 library libnvidia-glsi.so
W1212 17:21:26.996125 1025 nvc_info.c:306] missing compat32 library libnvidia-fbc.so
W1212 17:21:26.996127 1025 nvc_info.c:306] missing compat32 library libnvidia-ifr.so
W1212 17:21:26.996130 1025 nvc_info.c:306] missing compat32 library libnvidia-rtcore.so
W1212 17:21:26.996133 1025 nvc_info.c:306] missing compat32 library libnvoptix.so
W1212 17:21:26.996135 1025 nvc_info.c:306] missing compat32 library libGLX_nvidia.so
W1212 17:21:26.996137 1025 nvc_info.c:306] missing compat32 library libEGL_nvidia.so
W1212 17:21:26.996140 1025 nvc_info.c:306] missing compat32 library libGLESv2_nvidia.so
W1212 17:21:26.996142 1025 nvc_info.c:306] missing compat32 library libGLESv1_CM_nvidia.so
W1212 17:21:26.996144 1025 nvc_info.c:306] missing compat32 library libnvidia-glvkspirv.so
I1212 17:21:26.996243 1025 nvc_info.c:232] selecting /usr/bin/nvidia-smi
I1212 17:21:26.996252 1025 nvc_info.c:232] selecting /usr/bin/nvidia-debugdump
I1212 17:21:26.996260 1025 nvc_info.c:232] selecting /usr/bin/nvidia-persistenced
I1212 17:21:26.996269 1025 nvc_info.c:232] selecting /usr/bin/nvidia-cuda-mps-control
I1212 17:21:26.996278 1025 nvc_info.c:232] selecting /usr/bin/nvidia-cuda-mps-server
I1212 17:21:26.996290 1025 nvc_info.c:369] listing device /dev/nvidiactl
I1212 17:21:26.996293 1025 nvc_info.c:369] listing device /dev/nvidia-uvm
I1212 17:21:26.996298 1025 nvc_info.c:369] listing device /dev/nvidia-uvm-tools
I1212 17:21:26.996302 1025 nvc_info.c:369] listing device /dev/nvidia-modeset
W1212 17:21:26.996317 1025 nvc_info.c:277] missing ipc /var/run/nvidia-persistenced/socket
W1212 17:21:26.996324 1025 nvc_info.c:277] missing ipc /tmp/nvidia-mps
I1212 17:21:26.996327 1025 nvc_info.c:493] requesting device information with ''
I1212 17:21:27.001816 1025 nvc_info.c:523] listing device /dev/nvidia0 (GPU-8d02206c-0145-a6d6-a681-fd178d12a183 at 00000000:09:00.0)
NVRM version: 440.36
CUDA version: 10.2
Device Index: 0
Device Minor: 0
Model: GeForce GTX 1660 SUPER
Brand: GeForce
GPU UUID: GPU-8d02206c-0145-a6d6-a681-fd178d12a183
Bus Location: 00000000:09:00.0
Architecture: 7.5
I1212 17:21:27.001835 1025 nvc.c:318] shutting down library context
I1212 17:21:27.001974 1026 driver.c:192] terminating driver service
I1212 17:21:27.146646 1025 driver.c:233] driver service terminated successfully
I also am experiencing this issue.
Any update on this?
Any update on this?
Given the changes in the architecture and the move to CDI, our LXC support would have to be revisited. If there is still a need, please open a new issue against https://github.com/NVIDIA/nvidia-container-toolkit with the requirements.
Hi everyone, I am trying to use LXD with nested Docker for applications running on GPUs (e.g. deep learning with Tensorflow).
In my setup an unprivileged LXD container (usual Ubuntu 18.04 image from LXD) is running on a Ubuntu Server 18.04 Host. In this container
nvidia-driver-430
(I can successfully executenvidia-smi
) and Docker (runninghello-world
container correctly) are installed. The GPU is passed from the host to the LXD container by mapping all devices in the LXD container (/dev/nvidia-uvm
,/dev/nvidia-uvm-tools
,/dev/nvidia0
,/dev/nvidiactl
).In order to pass the GPU from the LXD container to a Docker container
nvidia-container-runtime
was installed within the LXD container and registered in/etc/docker/daemon.json
.The following error occurs when trying to execute a Docker container with GPU access:
There seems to be some kind of permission problem writing to
/sys/fs/cgroup/devices/docker/
. Trying the same setup with Docker nested in a privileged LXD container (LXD optionsecurity.privileged true
) seems to work fine. In that case no error occurs and the output fromnvidia-smi
(Docker container) is shown correctly.However, privileged containers are not possible for my use case. Has anyone faced the same (or a similar) problem? Do you have an idea how to debug this problem further? I tried to debug using
strace
but didn't get any clue. The same applies for the logs (see below).Please let me know when you have any questions or need some more information.
Background Information:
I am using a GeForce GTX 1080 Ti (using
nvidia-driver-430
) on Ubuntu Server 18.04.3, with the following versions of LXD and Docker.LXD version (on host):
Docker version (in LXD container)
Content of
/var/log/nvidia-container-runtime.log
for the error shown above:Content of
/var/log/nvidia-container-toolkit.log
for the error shown above: