Closed super-cooper closed 3 years ago
Hi,
I'm experiencing the same issue. For now I've worked around it:
In /etc/nvidia-container-runtime/config.toml
I've set no-cgroups = true
and now the container starts, but the nvidia devices are not added to the container. Once the devices are added the container works again.
Here are the relevant lines from my docker-compose.yml
:
devices:
- /dev/nvidia0:/dev/nvidia0
- /dev/nvidiactl:/dev/nvidiactl
- /dev/nvidia-modeset:/dev/nvidia-modeset
- /dev/nvidia-uvm:/dev/nvidia-uvm
- /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools
This is equivalent to docker run --device /dev/whatever ...
, but I'm not sure of the exact syntax.
Hope this helps.
This seems to be related to systemd
upgrade to 247.2-2
which was uploaded to sid three weeks ago and made its way to testing now. This commit highlights the change of cgroup hierarchy: https://salsa.debian.org/systemd-team/systemd/-/commit/170fb124a32884bd9975ee4ea9e1ffbbc2ee26b4
Indeed, default setup does not expose anymore /sys/fs/cgroup/devices
which libnvidia-container
uses according to https://github.com/NVIDIA/libnvidia-container/blob/ac02636a318fe7dcc71eaeb3cc55d0c8541c1072/src/nvc_container.c#L379-L382
Using the documented systemd.unified_cgroup_hierarchy=false
kernel command line parameter switch back the /sys/fs/cgroup/devices
entry and libnvidia-container
is happier.
@lissyx Thank you for printing out the crux of the issue.
We are in the process of rearchitecting the nvidia container stack in such a way that issues such as this should not exist in the future (because we will rely on runc
(or whatever the configured container runtime is) to do all cgroup setup instead of doing it ourselves).
That said, this rearchitecting effort will take at least another 9 months to complete. I'm curious what the impact is (and how difficult it would be to add cgroupsv2
support to libnvidia-container
in the meantime to prevent issues like this until the rearchitecting is complete).
Wanted to also chime in to say that I'm also experiencing this on Fedora 33
Could the title be updated to indicate that it is systemd cgroup layout related?
I was under the impression this issue was related to adding cgroup v2 support.
The systemd cgroup layout issue was resoolved in: https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/49
And released today as part of libnvidia-container v1.3.2: https://github.com/NVIDIA/libnvidia-container/releases/tag/v1.3.2
If these resolve this issue, please comment and close. Thanks.
I was under the impression this issue was related to adding cgroup v2 support.
The systemd cgroup layout issue was resoolved in: https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/49
And released today as part of libnvidia-container v1.3.2: https://github.com/NVIDIA/libnvidia-container/releases/tag/v1.3.2
If these resolve this issue, please comment and close. Thanks.
Issue resolved by the latest release. Thank you everyone <3
I was under the impression this issue was related to adding cgroup v2 support. The systemd cgroup layout issue was resoolved in: https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/49 And released today as part of libnvidia-container v1.3.2: https://github.com/NVIDIA/libnvidia-container/releases/tag/v1.3.2 If these resolve this issue, please comment and close. Thanks.
Issue resolved by the latest release. Thank you everyone <3
Did you set the following parameter: systemd.unified_cgroup_hierarchy=false
?
Or did you just upgrade all the packages?
I was under the impression this issue was related to adding cgroup v2 support. The systemd cgroup layout issue was resoolved in: https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/49 And released today as part of libnvidia-container v1.3.2: https://github.com/NVIDIA/libnvidia-container/releases/tag/v1.3.2 If these resolve this issue, please comment and close. Thanks.
Issue resolved by the latest release. Thank you everyone <3
Did you set the following parameter:
systemd.unified_cgroup_hierarchy=false
?Or did you just upgrade all the packages?
For me it was solved by upgrading the package.
Thank you, @super-cooper, for the reply.
I am having exactly the same issue on Debian Testing even after an upgrade.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.
docker run --rm --gpus all nvidia/cuda:11.0-base-ubuntu20.04 nvidia-smi
nvidia-container-cli -k -d /dev/tty info
I0130 05:23:50.494974 4486 nvc.c:282] initializing library context (version=1.3.2, build=fa9c778f687e9ac7be52b0299fa3b6ac2d9fbf93)
I0130 05:23:50.495160 4486 nvc.c:256] using root /
I0130 05:23:50.495178 4486 nvc.c:257] using ldcache /etc/ld.so.cache
I0130 05:23:50.495194 4486 nvc.c:258] using unprivileged user 1000:1000
I0130 05:23:50.495256 4486 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0130 05:23:50.495644 4486 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment
W0130 05:23:50.499341 4487 nvc.c:187] failed to set inheritable capabilities
W0130 05:23:50.499369 4487 nvc.c:188] skipping kernel modules load due to failure
I0130 05:23:50.499601 4488 driver.c:101] starting driver service
I0130 05:23:50.504376 4486 nvc_info.c:680] requesting driver information with ''
I0130 05:23:50.506132 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.460.32.03
I0130 05:23:50.506191 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.460.32.03
I0130 05:23:50.506283 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.460.32.03
I0130 05:23:50.506375 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.460.32.03
I0130 05:23:50.506418 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.460.32.03
I0130 05:23:50.506467 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.460.32.03
I0130 05:23:50.506512 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.460.32.03
I0130 05:23:50.506557 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.460.32.03
I0130 05:23:50.506669 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.460.32.03
I0130 05:23:50.506714 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.460.32.03
I0130 05:23:50.507077 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.460.32.03
I0130 05:23:50.507376 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.460.32.03
I0130 05:23:50.507476 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLESv2_nvidia.so.460.32.03
I0130 05:23:50.507569 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLESv1_CM_nvidia.so.460.32.03
I0130 05:23:50.507669 4486 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libEGL_nvidia.so.460.32.03
W0130 05:23:50.507732 4486 nvc_info.c:350] missing library libnvidia-opencl.so
W0130 05:23:50.507741 4486 nvc_info.c:350] missing library libnvidia-fatbinaryloader.so
W0130 05:23:50.507748 4486 nvc_info.c:350] missing library libnvidia-allocator.so
W0130 05:23:50.507754 4486 nvc_info.c:350] missing library libnvidia-compiler.so
W0130 05:23:50.507760 4486 nvc_info.c:350] missing library libnvidia-ngx.so
W0130 05:23:50.507766 4486 nvc_info.c:350] missing library libvdpau_nvidia.so
W0130 05:23:50.507772 4486 nvc_info.c:350] missing library libnvidia-encode.so
W0130 05:23:50.507781 4486 nvc_info.c:350] missing library libnvidia-opticalflow.so
W0130 05:23:50.507788 4486 nvc_info.c:350] missing library libnvcuvid.so
W0130 05:23:50.507796 4486 nvc_info.c:350] missing library libnvidia-fbc.so
W0130 05:23:50.507806 4486 nvc_info.c:350] missing library libnvidia-ifr.so
W0130 05:23:50.507815 4486 nvc_info.c:350] missing library libnvoptix.so
W0130 05:23:50.507823 4486 nvc_info.c:354] missing compat32 library libnvidia-ml.so
W0130 05:23:50.507832 4486 nvc_info.c:354] missing compat32 library libnvidia-cfg.so
W0130 05:23:50.507848 4486 nvc_info.c:354] missing compat32 library libcuda.so
W0130 05:23:50.507859 4486 nvc_info.c:354] missing compat32 library libnvidia-opencl.so
W0130 05:23:50.507869 4486 nvc_info.c:354] missing compat32 library libnvidia-ptxjitcompiler.so
W0130 05:23:50.507880 4486 nvc_info.c:354] missing compat32 library libnvidia-fatbinaryloader.so
W0130 05:23:50.507889 4486 nvc_info.c:354] missing compat32 library libnvidia-allocator.so
W0130 05:23:50.507897 4486 nvc_info.c:354] missing compat32 library libnvidia-compiler.so
W0130 05:23:50.507906 4486 nvc_info.c:354] missing compat32 library libnvidia-ngx.so
W0130 05:23:50.507915 4486 nvc_info.c:354] missing compat32 library libvdpau_nvidia.so
W0130 05:23:50.507925 4486 nvc_info.c:354] missing compat32 library libnvidia-encode.so
W0130 05:23:50.507933 4486 nvc_info.c:354] missing compat32 library libnvidia-opticalflow.so
W0130 05:23:50.507942 4486 nvc_info.c:354] missing compat32 library libnvcuvid.so
W0130 05:23:50.507950 4486 nvc_info.c:354] missing compat32 library libnvidia-eglcore.so
W0130 05:23:50.507960 4486 nvc_info.c:354] missing compat32 library libnvidia-glcore.so
W0130 05:23:50.507970 4486 nvc_info.c:354] missing compat32 library libnvidia-tls.so
W0130 05:23:50.507979 4486 nvc_info.c:354] missing compat32 library libnvidia-glsi.so
W0130 05:23:50.507988 4486 nvc_info.c:354] missing compat32 library libnvidia-fbc.so
W0130 05:23:50.507998 4486 nvc_info.c:354] missing compat32 library libnvidia-ifr.so
W0130 05:23:50.508007 4486 nvc_info.c:354] missing compat32 library libnvidia-rtcore.so
W0130 05:23:50.508015 4486 nvc_info.c:354] missing compat32 library libnvoptix.so
W0130 05:23:50.508025 4486 nvc_info.c:354] missing compat32 library libGLX_nvidia.so
W0130 05:23:50.508031 4486 nvc_info.c:354] missing compat32 library libEGL_nvidia.so
W0130 05:23:50.508040 4486 nvc_info.c:354] missing compat32 library libGLESv2_nvidia.so
W0130 05:23:50.508050 4486 nvc_info.c:354] missing compat32 library libGLESv1_CM_nvidia.so
W0130 05:23:50.508060 4486 nvc_info.c:354] missing compat32 library libnvidia-glvkspirv.so
W0130 05:23:50.508068 4486 nvc_info.c:354] missing compat32 library libnvidia-cbl.so
I0130 05:23:50.508515 4486 nvc_info.c:276] selecting /usr/lib/nvidia/current/nvidia-smi
I0130 05:23:50.508580 4486 nvc_info.c:276] selecting /usr/lib/nvidia/current/nvidia-debugdump
I0130 05:23:50.508612 4486 nvc_info.c:276] selecting /usr/bin/nvidia-persistenced
W0130 05:23:50.509049 4486 nvc_info.c:376] missing binary nvidia-cuda-mps-control
W0130 05:23:50.509060 4486 nvc_info.c:376] missing binary nvidia-cuda-mps-server
I0130 05:23:50.509100 4486 nvc_info.c:438] listing device /dev/nvidiactl
I0130 05:23:50.509109 4486 nvc_info.c:438] listing device /dev/nvidia-uvm
I0130 05:23:50.509118 4486 nvc_info.c:438] listing device /dev/nvidia-uvm-tools
I0130 05:23:50.509127 4486 nvc_info.c:438] listing device /dev/nvidia-modeset
I0130 05:23:50.509168 4486 nvc_info.c:317] listing ipc /run/nvidia-persistenced/socket
W0130 05:23:50.509192 4486 nvc_info.c:321] missing ipc /tmp/nvidia-mps
I0130 05:23:50.509200 4486 nvc_info.c:745] requesting device information with ''
I0130 05:23:50.516712 4486 nvc_info.c:628] listing device /dev/nvidia0 (GPU-6064a007-a943-7f11-1ad7-12ac87046652 at 00000000:01:00.0)
NVRM version: 460.32.03
CUDA version: 11.2
Device Index: 0 Device Minor: 0 Model: GeForce GTX 960M Brand: GeForce GPU UUID: GPU-6064a007-a943-7f11-1ad7-12ac87046652 Bus Location: 00000000:01:00.0 Architecture: 5.0 I0130 05:23:50.516775 4486 nvc.c:337] shutting down library context I0130 05:23:50.517704 4488 driver.c:156] terminating driver service I0130 05:23:50.518087 4486 driver.c:196] driver service terminated successfully
- [x] Kernel version from `uname -a`
Linux stas 5.10.0-2-amd64 #1 SMP Debian 5.10.9-1 (2021-01-20) x86_64 GNU/Linux
- [x] Any relevant kernel output lines from `dmesg`
[ 487.597570] docker0: port 1(vethb7a49e6) entered blocking state [ 487.597573] docker0: port 1(vethb7a49e6) entered disabled state [ 487.597786] device vethb7a49e6 entered promiscuous mode [ 487.773120] docker0: port 1(vethb7a49e6) entered disabled state [ 487.776548] device vethb7a49e6 left promiscuous mode [ 487.776556] docker0: port 1(vethb7a49e6) entered disabled state
- [x] Driver information from `nvidia-smi -a`
Timestamp : Sat Jan 30 08:26:51 2021 Driver Version : 460.32.03 CUDA Version : 11.2
Attached GPUs : 1 GPU 00000000:01:00.0 Product Name : GeForce GTX 960M Product Brand : GeForce Display Mode : Disabled Display Active : Disabled Persistence Mode : Enabled MIG Mode Current : N/A Pending : N/A Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : N/A GPU UUID : GPU-6064a007-a943-7f11-1ad7-12ac87046652 Minor Number : 0 VBIOS Version : 82.07.82.00.10 MultiGPU Board : No Board ID : 0x100 GPU Part Number : N/A Inforom Version Image Version : N/A OEM Object : N/A ECC Object : N/A Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x01 Device : 0x00 Domain : 0x0000 Device Id : 0x139B10DE Bus Id : 00000000:01:00.0 Sub System Id : 0x380217AA GPU Link Info PCIe Generation Max : 3 Current : 1 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 0 KB/s Rx Throughput : 0 KB/s Fan Speed : N/A Performance State : P8 Clocks Throttle Reasons Idle : Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : N/A HW Power Brake Slowdown : N/A Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 4046 MiB Used : 4 MiB Free : 4042 MiB BAR1 Memory Usage Total : 256 MiB Used : 1 MiB Free : 255 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : N/A Pending : N/A ECC Errors Volatile Single Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Double Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Aggregate Single Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Double Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows : N/A Temperature GPU Current Temp : 33 C GPU Shutdown Temp : 101 C GPU Slowdown Temp : 96 C GPU Max Operating Temp : 92 C GPU Target Temperature : N/A Memory Current Temp : N/A Memory Max Operating Temp : N/A Power Readings Power Management : N/A Power Draw : N/A Power Limit : N/A Default Power Limit : N/A Enforced Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Clocks Graphics : 135 MHz SM : 135 MHz Memory : 405 MHz Video : 405 MHz Applications Clocks Graphics : 1097 MHz Memory : 2505 MHz Default Applications Clocks Graphics : 1097 MHz Memory : 2505 MHz Max Clocks Graphics : 1202 MHz SM : 1202 MHz Memory : 2505 MHz Video : 1081 MHz Max Customer Boost Clocks Graphics : N/A Clock Policy Auto Boost : N/A Auto Boost Default : N/A Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 1351 Type : G Name : /usr/lib/xorg/Xorg Used GPU Memory : 2 MiB
- [x] Docker version from `docker version`
Client: Docker Engine - Community Version: 20.10.2 API version: 1.41 Go version: go1.13.15 Git commit: 2291f61 Built: Mon Dec 28 16:17:34 2020 OS/Arch: linux/amd64 Context: default Experimental: true
Server: Docker Engine - Community Engine: Version: 20.10.2 API version: 1.41 (minimum version 1.12) Go version: go1.13.15 Git commit: 8891c58 Built: Mon Dec 28 16:15:28 2020 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.4.3 GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b runc: Version: 1.0.0-rc92 GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff docker-init: Version: 0.19.0 GitCommit: de40ad0
- [x] NVIDIA packages version from `dpkg -l '*nvidia*'` _or_ `rpm -qa '*nvidia*'`
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-======================================-==============================-============-=================================================================
un bumblebee-nvidia
- [x] NVIDIA container library version from `nvidia-container-cli -V`
version: 1.3.2 build date: 2021-01-25T11:07+00:00 build revision: fa9c778f687e9ac7be52b0299fa3b6ac2d9fbf93 build compiler: x86_64-linux-gnu-gcc-8 8.3.0 build platform: x86_64 build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
- [x] NVIDIA container library logs (see [troubleshooting](https://github.com/NVIDIA/nvidia-docker/wiki/Troubleshooting))
`/var/log/nvidia-container-toolkit.log` is not generated.
- [x] Docker command, image and tag used
docker run --rm --gpus all nvidia/cuda:11.0-base-ubuntu20.04 nvidia-smi
@klueska Could you please check the issue?
@regzon thanks for indicating that this is still and issue. Could you please check what your systemd
cgroup configuration is? (see for example this other issue which shows similar behaviour: https://github.com/docker/cli/issues/2104#issuecomment-535560873)
@regzon your issue is likely related to the fact that libnvidia-container
does not support cgroups v2.
You will need to follow the suggestion in the comments above for https://github.com/NVIDIA/nvidia-docker/issues/1447#issuecomment-760059332 to force systemd to use v1 cgroups.
In any case -- we do not officially support Debian Testing nor cgroups v2 (yet).
@elezar @klueska thank you for your help. When forcing the systemd
to not use the unified hierarchy, everything works fine. I thought that the latest libnvidia-container
upgrade would resolve the issue (as it did for @super-cooper). But if the upgrade is not intended to fix the issue with cgroups, then everything is fine.
@klueska I'm having the same "issue", i.e. missing support for cgroups v2 (which I would very much like for other reasons). Is there already an issue for this to track?
We are not planning on building support for cgroups v2 into the existing nvidia-docker stack.
Please see my comment above for more info: https://github.com/NVIDIA/nvidia-docker/issues/1447#issuecomment-760189260
Let me rephrase it then: I want to use nvidia-docker on a system where cgroup v2 is enabled (systemd.unified_cgroup_hierarchy=true
).
Right now this is not working and this bug is closed. So is there an issue that I can track to know when I can use nvidia-docker on hosts with cgroup v2 enabled?
We have it tracked in our internal JIRA with a link to this this issue as the location to report once the work is complete: https://github.com/NVIDIA/libnvidia-container/issues/111
facebook oomd requires cgroup v2, i.e. systemd.unified_cgroup_hierarchy=1. So either users freeze the boxes pretty often and render them unusable, or they cannot use nvidia-containers. Both is crap. We will probably drop the nvidia-docker non-sense.
For Debian users, you can disable cgroup hierarchy by editing /etc/default/grub and adding systemd.unified_cgroup_hierarchy=0 to the end of the GRUB_CMDLINE_LINUX_DEFAULT options. Example: ... GRUB_CMDLINE_LINUX_DEFAULT="quiet systemd.unified_cgroup_hierarchy=0" ...
Then run update-grub and reboot for changes to take effect.
It's worth noting that I also had to modify /etc/nvidia-container-runtime/config.toml to remove the '@' symbol and update to the correct location of ldconfig for my system (Debian Unstable). eg: ldconfig = "/usr/sbin/ldconfig"
This worked for me, I hope this saves someone else some time.
Fix on Arch:
Edit /etc/nvidia-container-runtime/config.toml and change #no-cgroups=false to no-cgroups=true. After a restart of the docker.service everything worked as usual.
@Zethson I also use Arch and yesterday I followed your suggestion. It seemed to work (I was able to start the containers), but running nvidia-smi
I had no accesso to my GPU from inside docker.
Reading the other answers in this issue, I solved by adding systemd.unified_cgroup_hierarchy=0
to boot kernel parameters and commenting again the entry no-cgroups
in /etc/nvidia-container-runtime/config.toml
Arch has now cgroup v2 enabled by default, so it'd be useful to plan for supporting it.
Fix on Arch:
Edit /etc/nvidia-container-runtime/config.toml and change #no-cgroups=false to no-cgroups=true. After a restart of the docker.service everything worked as usual.
Awesome this works well.
Fix on NixOS (where cgroup v2 is also now default): add
systemd.enableUnifiedCgroupHierarchy = false;
and restart.
This worked for me on Manjaro Linux (Arch Linux as base) without deactivating cgroup v2:
Create the folder docker.service.d
under /etc/systemd/system
, create file override.conf
in this folder:
[Service]
ExecStartPre=-/usr/bin/nvidia-modprobe -c 0 -u
After that you have to add the following content to your docker-compose.yml
, thank you @DanielCeregatti :
devices:
- /dev/nvidia0:/dev/nvidia0
- /dev/nvidiactl:/dev/nvidiactl
- /dev/nvidia-modeset:/dev/nvidia-modeset
- /dev/nvidia-uvm:/dev/nvidia-uvm
- /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools
Background: The nvidia-uvm and nvidia-uvm-tools folder did not exist unter /dev
for me. After running nvidia-modprobe -c 0 -u
they appeared but disappeared after reboot. This workaround adds these folders before docker starts. Unfortunately I don`t know why these folders do not exist by default. Maybe somebody can complement. Currently using Linux 5.12. Maybe it has to do with this kernel version.
Edit: This workaround works only if the container using NVIDIA is restarted afterwards. I do not know why, but if not, the container starts, but cannot access the created directories.
Update 25.06.2021: Found out why I had to restart jellyfin. Docker started before my disks were online. If somebody has this problem too, here is the fix: https://github.com/openmediavault/openmediavault/issues/458#issuecomment-628076472
After that you have to add the following content to your
docker-compose.yml
, thank you @DanielCeregatti :devices: - /dev/nvidia0:/dev/nvidia0 - /dev/nvidiactl:/dev/nvidiactl - /dev/nvidia-modeset:/dev/nvidia-modeset - /dev/nvidia-uvm:/dev/nvidia-uvm - /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools
Hi,
I'm running Manjaro and facing the same issue: when I run the container using docker run
(e.g. docker run -it --gpus all --privileged -v /dev:/dev --rm tensorflow/tensorflow:latest-gpu python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
) it works, but I did not manage to make it work with docker-compose up
.
Could you please post a complete, working docker-compose.yml
file? Thank you very much!
Never mind, I have just managed to make it work with docker-compose
. I'll post here a minimal working example:
services:
test:
image: tensorflow/tensorflow:latest-gpu
command: python -c "import tensorflow as tf;print(tf.config.list_physical_devices('GPU'))"
devices:
- /dev/nvidia0:/dev/nvidia0
- /dev/nvidiactl:/dev/nvidiactl
- /dev/nvidia-modeset:/dev/nvidia-modeset
- /dev/nvidia-uvm:/dev/nvidia-uvm
- /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
Minimal working example on Arch with nvidia-container-toolkit
(from AUR) installed:
docker run --rm --gpus all \
--device /dev/nvidia0 --device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools --device /dev/nvidiactl \
nvidia/cuda:11.0-base nvidia-smi
Without the --device
s I get this unhelpful message: Failed to initialize NVML: Unknown Error
.
Edit: also make sure you have no-cgroups = true
in /etc/nvidia-container-runtime/config.toml
(thanks @mpizenberg)
Minimal working example on Arch with
nvidia-container-toolkit
(from AUR) installed: ... Without the--device
s I get this unhelpful message:Failed to initialize NVML: Unknown Error
.
@japm48 may I ask what changes did you do exactly to have that command work? Did you also do the systemd.unified_cgroup_hierarchy=false
kernel parameter change and the no-cgroups = false
nvidia config change?
Without doing those, I'm on Arch with kernel 5.14.14, with version 1.5.1-1 of the aur/nvidia-container-toolkit and when running the command:
docker run --rm --gpus all \
--device /dev/nvidia0 --device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools --device /dev/nvidiactl \
nvidia/cuda:11.4.2-base-ubuntu20.04 nvidia-smi
I get
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.
Same if I change no-cgroup
config to false. I haven't tried changing my kernel parameters though I'd like to avoid that!
EDIT: it now works with some changes
Ok I actually got it working on my system with the following setup:
systemd.unified_cgroup_hierarchy
kernel parameterno-cgroups = true
in /etc/nvidia-container-runtime/config.toml
--device
params to the docker run command as follows:docker run --rm --gpus all \
--device /dev/nvidia0 --device /dev/nvidia-modeset --device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools --device /dev/nvidiactl \
nvidia/cuda:11.4.2-base-ubuntu20.04 nvidia-smi
After settings no-cgroup to true I get this error:
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.
OS: debian 11
After settings no-cgroup to true I get this error:
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system. Please also try adding directory that contains libnvidia-ml.so to your system PATH.
OS: debian 11
https://github.com/NVIDIA/nvidia-docker/issues/1163#issuecomment-824775675
ldconfig = "/sbin/ldconfig.real"
this worked for me on debian 11.
@mpizenberg
@japm48 may I ask what changes did you do exactly to have that command work? Did you also do the
systemd.unified_cgroup_hierarchy=false
kernel parameter change and theno-cgroups = false
nvidia config change?
I'm really sorry I didn't see your message.
I had no-cgroups = true
in /etc/nvidia-container-runtime/config.toml
, but I didn't modify the file. This is likely because, as I did a fresh install, I got the patched config file; and I guess you had the previous (unpatched) version installed, so it wasn't overwritten on update.
@lissyx Thank you for printing out the crux of the issue. We are in the process of rearchitecting the nvidia container stack in such a way that issues such as this should not exist in the future (because we will rely on
runc
(or whatever the configured container runtime is) to do all cgroup setup instead of doing it ourselves).That said, this rearchitecting effort will take at least another 9 months to complete. I'm curious what the impact is (and how difficult it would be to add
cgroupsv2
support tolibnvidia-container
in the meantime to prevent issues like this until the rearchitecting is complete).
@klueska It's been 11 months, any updates on this rearchitecting :)
The rearchitecture work has been slower that we hoped, but (somewhat because of this), we have now built support for cgroupv2
in libnvidia-container
and it is currently under review. We hope to have an RC out before christmas.
Here is the MR chain: https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/113 https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/114 https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/115 https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/116 https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/117
We now have an RC of libnvidia-container out that adds support for cgroupv2
.
If you would like to try it out, make sure and add the experimental
repo to your apt sources and install the latest packages:
sudo sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/libnvidia-container.list
sudo apt-get update
sudo apt-get install -y libnvidia-container-tools libnvidia-container1
sudo yum-config-manager --enable libnvidia-container-experimental
sudo yum install -y libnvidia-container-tools libnvidia-container1
was previously using the systemd.unified_cgroup_hierarchy=false
kernel command line parameter on debian bullseye and removed it after I upgraded to 1.8.0rc1 and every container that is using my gpu seems to be working perfectly fine so far
thanks !
the only minor diff I have with the instructions above is that my repo sources is called nvidia-docker.list
and not libnvidia-container.list
not sure why
Yes, that may be true for many users and I should have pointed that out.
We used to host packages across three different repos and recently consolidated down to just 1 (i.e. libnvidia-container
). The changes were mostly transparent, but I can see how instructions for enabling the experimental
repo may need to be tweaked depending on which repos you actually have configured.
Basically we used to package binaries and host packages as seen in the table below:
Binary | Package | Repo |
---|---|---|
nvidia-docker | nvidia-docker2 | nvidia.github.io/nvidia-docker |
nvidia-container-runtime | nvidia-container-runtime | nvidia.github.io/nvidia-container-runtime |
nvidia-container-toolkit | nvidia-container-toolkit | nvidia.github.io/nvidia-container-runtime |
nvidia-container-cli | libnvidia-container-tools | nvidia.github.io/libnvidia-container |
libnvidia-container.so.1 | libnvidia-container1 | nvidia.github.io/libnvidia-container |
But that changed recently to:
Binary | Package | Repo |
---|---|---|
nvidia-docker | nvidia-docker2 | nvidia.github.io/libnvidia-container |
nvidia-container-runtime | nvidia-container-toolkit | nvidia.github.io/libnvidia-container |
nvidia-container-toolkit | nvidia-container-toolkit | nvidia.github.io/libnvidia-container |
nvidia-container-cli | libnvidia-container-tools | nvidia.github.io/libnvidia-container |
libnvidia-container.so.1 | libnvidia-container1 | nvidia.github.io/libnvidia-container |
So nowadays all you actually need is libnvidia-contianer.list
to get access to all of new packages, but if you nvidia-docker.list
that is still OK because it also contains entries for all of the repos listed in libnvidia-contianer.list
(it just contains entries for more -- now unnecessary -- repos as well).
Pop OS till now did not have this problem; But with the latest update 21.1 I got this error
Setting no-cgroups = true in /etc/nvidia-container-runtime/config.toml
made the docker start; but TensorFlow print GPU returned zero. Also tried with Podman,
sudo podman run -e NVIDIA_VISIBLE_DEVICES=0 -it --network host -v /home/alex/coding:/tf/notebooks docker.io/tensorflow/tensorflow:latest-gpu-jupyter
but TensorFlow print GPU returned zero.
But switching to CgroupsV1 via the kernel parameter worked; For anyone else who is using PopOS which used systemd-boot instead of Grub the below commands may help
sudo kernelstub -a "systemd.unified_cgroup_hierarchy=0"
sudo update-initramfs -c -k all
reboot
Error string before this
xx@pop-os:~/coding/cnn_1/cnn_py$ docker run --gpus device=0 -it --network host -v /home/alex/coding―tf/notebooks tensorflow/tensorflow:latest-gpu-jupyter
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.
ERRO[0000] error waiting for container: context canceled
alex@pop-os:~/coding/cnn_1/cnn_py$ apt list nvidia-container-toolkit
Listing... Done
nvidia-container-toolkit/now 1.5.1-1pop1~1627998766~21.04~9847cf2 amd64 [installed,local]
After restart
More details https://medium.com/nttlabs/cgroup-v2-596d035be4d7
For those who are here after upgrading to Ubuntu 21.10 (not supported), using the experimental version of the 18.04 and reinstalling libnvidia-container-tools
and libnvidia-container1
works. (Don't forget to restart docker afterwards).
Thank you for all your amazing work!
@muhark I have the same issue on 21.10. which versions did you install? which commands did you use for reinstalling/experimental version that you mention fixed it? any help much appreciated!
@ljburtz, I used the command by @klueska above:
sudo sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/libnvidia-container.list
Then my /etc/apt/sources.list.d/nvidia-docker.list
looks like the following:
#deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /
#deb https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-container-runtime/experimental/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH) /
And finally I reinstalled (since I'd already been bungling the existing installations).
sudo apt-get update
sudo apt-get install --reinstall libnvidia-container-tools libnividia-container1
and finally tested with:
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
FWIW, am using a GTX 3070.
fantastic this works on GTX 3060 / ubuntu21.10 major thanks for replying so fast @muhark
If you need cgroups active so cannot do no-cgroups = true
and you're on PopOS 21.10, As per this explanation this one command fixed this issue for me while keeping cgroups on:
sudo kernelstub -a systemd.unified_cgroup_hierarchy=0
I then had to reboot and the issue is gone.
libnvidia-container-1.8.0-rc.2
is now live with some minor updates to fix some edge cases around cgroupv2
support.
Please see https://github.com/NVIDIA/libnvidia-container/issues/111#issuecomment-989024375 for instructions on how to get access to this RC (or wait for the full release at the end of next week).
Note: This does not directly add debian testing
support, but you can point to the debian10
repo and install from there for now.
This may be useful for Ubuntu users running into this issue:
So nowadays all you actually need is
libnvidia-contianer.list
to get access to all of new packages, but if younvidia-docker.list
that is still OK because it also contains entries for all of the repos listed inlibnvidia-contianer.list
(it just contains entries for more -- now unnecessary -- repos as well).
@klueska , I just wanted to mention when I go to the following URLs:
https://nvidia.github.io/nvidia-docker/ubuntu18.04/nvidia-docker.list https://nvidia.github.io/nvidia-docker/ubuntu20.04/nvidia-docker.list
I get a valid apt list in response.
But if I visit:
https://nvidia.github.io/nvidia-docker/ubuntu18.04/libnvidia-container.list https://nvidia.github.io/nvidia-docker/ubuntu20.04/libnvidia-container.list
I get # Unsupported distribution! # Check https://nvidia.github.io/nvidia-docker
.
It appears the list has been moved back to the original filename?
These: https://nvidia.github.io/nvidia-docker/ubuntu18.04/libnvidia-container.list https://nvidia.github.io/nvidia-docker/ubuntu20.04/libnvidia-container.list
Should be: https://nvidia.github.io/libnvidia-container/ubuntu18.04/libnvidia-container.list https://nvidia.github.io/libnvidia-container/ubuntu20.04/libnvidia-container.list
ah, :facepalm: , much appreciated, thanks for making it explicit.
libnvidia-container-1.8.0
with cgroupv2
support is now GA
Release notes here: https://github.com/NVIDIA/libnvidia-container/releases/tag/v1.8.0
Debian 11 support has now been added such that running the following should now work as expected:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
1. Issue or feature description
Whenever I try to build or run an NVidia container, Docker fails with the error message:
2. Steps to reproduce the issue
3. Information to attach (optional if deemed irrelevant)
nvidia-container-cli -k -d /dev/tty info
Device Index: 0 Device Minor: 0 Model: GeForce GTX 980 Ti Brand: GeForce GPU UUID: GPU-6518be5e-14ff-e277-21aa-73b482890bee Bus Location: 00000000:07:00.0 Architecture: 5.2 I0107 20:43:11.947903 36435 nvc.c:337] shutting down library context I0107 20:43:11.948696 36437 driver.c:156] terminating driver service I0107 20:43:11.949026 36435 driver.c:196] driver service terminated successfully
Linux lambda 5.8.0-3-amd64 #1 SMP Debian 5.8.14-1 (2020-10-10) x86_64 GNU/Linux
Thu Jan 7 15:45:08 2021
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GeForce GTX 980 Ti On | 00000000:07:00.0 On | N/A | | 0% 45C P5 29W / 250W | 403MiB / 6083MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 3023 G /usr/lib/xorg/Xorg 177MiB | | 0 N/A N/A 4833 G /usr/bin/gnome-shell 166MiB | | 0 N/A N/A 7609 G ...AAAAAAAAA= --shared-files 54MiB | +-----------------------------------------------------------------------------+
Server: Docker Engine - Community Engine: Version: 20.10.2 API version: 1.41 (minimum version 1.12) Go version: go1.13.15 Git commit: 8891c58 Built: Mon Dec 28 16:15:28 2020 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.4.3 GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b nvidia: Version: 1.0.0-rc92 GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff docker-init: Version: 0.19.0 GitCommit: de40ad0
Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-======================================-==============-============-================================================================= un bumblebee-nvidia (no description available)
ii glx-alternative-nvidia 1.2.0 amd64 allows the selection of NVIDIA as GLX provider
un libegl-nvidia-legacy-390xx0 (no description available)
un libegl-nvidia-tesla-418-0 (no description available)
un libegl-nvidia-tesla-440-0 (no description available)
un libegl-nvidia-tesla-450-0 (no description available)
ii libegl-nvidia0:amd64 450.80.02-2 amd64 NVIDIA binary EGL library
ii libegl-nvidia0:i386 450.80.02-2 i386 NVIDIA binary EGL library
un libegl1-glvnd-nvidia (no description available)
un libegl1-nvidia (no description available)
un libgl1-glvnd-nvidia-glx (no description available)
ii libgl1-nvidia-glvnd-glx:amd64 450.80.02-2 amd64 NVIDIA binary OpenGL/GLX library (GLVND variant)
ii libgl1-nvidia-glvnd-glx:i386 450.80.02-2 i386 NVIDIA binary OpenGL/GLX library (GLVND variant)
un libgl1-nvidia-glx (no description available)
un libgl1-nvidia-glx-any (no description available)
un libgl1-nvidia-glx-i386 (no description available)
un libgl1-nvidia-legacy-390xx-glx (no description available)
un libgl1-nvidia-tesla-418-glx (no description available)
un libgldispatch0-nvidia (no description available)
ii libgles-nvidia1:amd64 450.80.02-2 amd64 NVIDIA binary OpenGL|ES 1.x library
ii libgles-nvidia1:i386 450.80.02-2 i386 NVIDIA binary OpenGL|ES 1.x library
ii libgles-nvidia2:amd64 450.80.02-2 amd64 NVIDIA binary OpenGL|ES 2.x library
ii libgles-nvidia2:i386 450.80.02-2 i386 NVIDIA binary OpenGL|ES 2.x library
un libgles1-glvnd-nvidia (no description available)
un libgles2-glvnd-nvidia (no description available)
un libglvnd0-nvidia (no description available)
ii libglx-nvidia0:amd64 450.80.02-2 amd64 NVIDIA binary GLX library
ii libglx-nvidia0:i386 450.80.02-2 i386 NVIDIA binary GLX library
un libglx0-glvnd-nvidia (no description available)
un libnvidia-cbl (no description available)
un libnvidia-cfg.so.1 (no description available)
ii libnvidia-cfg1:amd64 450.80.02-2 amd64 NVIDIA binary OpenGL/GLX configuration library
un libnvidia-cfg1-any (no description available)
ii libnvidia-container-tools 1.3.1-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.3.1-1 amd64 NVIDIA container runtime library
ii libnvidia-eglcore:amd64 450.80.02-2 amd64 NVIDIA binary EGL core libraries
ii libnvidia-eglcore:i386 450.80.02-2 i386 NVIDIA binary EGL core libraries
un libnvidia-eglcore-450.80.02 (no description available)
ii libnvidia-encode1:amd64 450.80.02-2 amd64 NVENC Video Encoding runtime library
ii libnvidia-glcore:amd64 450.80.02-2 amd64 NVIDIA binary OpenGL/GLX core libraries
ii libnvidia-glcore:i386 450.80.02-2 i386 NVIDIA binary OpenGL/GLX core libraries
un libnvidia-glcore-450.80.02 (no description available)
ii libnvidia-glvkspirv:amd64 450.80.02-2 amd64 NVIDIA binary Vulkan Spir-V compiler library
ii libnvidia-glvkspirv:i386 450.80.02-2 i386 NVIDIA binary Vulkan Spir-V compiler library
un libnvidia-glvkspirv-450.80.02 (no description available)
un libnvidia-legacy-340xx-cfg1 (no description available)
un libnvidia-legacy-390xx-cfg1 (no description available)
ii libnvidia-ml-dev:amd64 11.1.1-3 amd64 NVIDIA Management Library (NVML) development files
un libnvidia-ml.so.1 (no description available)
ii libnvidia-ml1:amd64 450.80.02-2 amd64 NVIDIA Management Library (NVML) runtime library
ii libnvidia-ptxjitcompiler1:amd64 450.80.02-2 amd64 NVIDIA PTX JIT Compiler
ii libnvidia-rtcore:amd64 450.80.02-2 amd64 NVIDIA binary Vulkan ray tracing (rtcore) library
un libnvidia-rtcore-450.80.02 (no description available)
un libnvidia-tesla-418-cfg1 (no description available)
un libnvidia-tesla-440-cfg1 (no description available)
un libnvidia-tesla-450-cfg1 (no description available)
un libnvidia-tesla-450-cuda1 (no description available)
un libnvidia-tesla-450-ml1 (no description available)
un libopengl0-glvnd-nvidia (no description available)
ii nvidia-alternative 450.80.02-2 amd64 allows the selection of NVIDIA as GLX provider
un nvidia-alternative--kmod-alias (no description available)
un nvidia-alternative-legacy-173xx (no description available)
un nvidia-alternative-legacy-71xx (no description available)
un nvidia-alternative-legacy-96xx (no description available)
ii nvidia-container-runtime 3.4.0-1 amd64 NVIDIA container runtime
un nvidia-container-runtime-hook (no description available)
ii nvidia-container-toolkit 1.4.0-1 amd64 NVIDIA container runtime hook
ii nvidia-cuda-dev:amd64 11.1.1-3 amd64 NVIDIA CUDA development files
un nvidia-cuda-doc (no description available)
ii nvidia-cuda-gdb 11.1.1-3 amd64 NVIDIA CUDA Debugger (GDB)
un nvidia-cuda-mps (no description available)
ii nvidia-cuda-toolkit 11.1.1-3 amd64 NVIDIA CUDA development toolkit
ii nvidia-cuda-toolkit-doc 11.1.1-3 all NVIDIA CUDA and OpenCL documentation
un nvidia-current (no description available)
un nvidia-current-updates (no description available)
un nvidia-docker (no description available)
ii nvidia-docker2 2.5.0-1 all nvidia-docker CLI wrapper
ii nvidia-driver 450.80.02-2 amd64 NVIDIA metapackage
un nvidia-driver-any (no description available)
ii nvidia-driver-bin 450.80.02-2 amd64 NVIDIA driver support binaries
un nvidia-driver-bin-450.80.02 (no description available)
un nvidia-driver-binary (no description available)
ii nvidia-driver-libs:amd64 450.80.02-2 amd64 NVIDIA metapackage (OpenGL/GLX/EGL/GLES libraries)
ii nvidia-driver-libs:i386 450.80.02-2 i386 NVIDIA metapackage (OpenGL/GLX/EGL/GLES libraries)
un nvidia-driver-libs-any (no description available)
un nvidia-driver-libs-nonglvnd (no description available)
ii nvidia-egl-common 450.80.02-2 amd64 NVIDIA binary EGL driver - common files
ii nvidia-egl-icd:amd64 450.80.02-2 amd64 NVIDIA EGL installable client driver (ICD)
ii nvidia-egl-icd:i386 450.80.02-2 i386 NVIDIA EGL installable client driver (ICD)
un nvidia-glx-any (no description available)
ii nvidia-installer-cleanup 20151021+12 amd64 cleanup after driver installation with the nvidia-installer
un nvidia-kernel-450.80.02 (no description available)
ii nvidia-kernel-common 20151021+12 amd64 NVIDIA binary kernel module support files
ii nvidia-kernel-dkms 450.80.02-2 amd64 NVIDIA binary kernel module DKMS source
un nvidia-kernel-source (no description available)
ii nvidia-kernel-support 450.80.02-2 amd64 NVIDIA binary kernel module support files
un nvidia-kernel-support--v1 (no description available)
un nvidia-kernel-support-any (no description available)
un nvidia-legacy-304xx-alternative (no description available)
un nvidia-legacy-304xx-driver (no description available)
un nvidia-legacy-340xx-alternative (no description available)
un nvidia-legacy-340xx-vdpau-driver (no description available)
un nvidia-legacy-390xx-vdpau-driver (no description available)
un nvidia-legacy-390xx-vulkan-icd (no description available)
ii nvidia-legacy-check 450.80.02-2 amd64 check for NVIDIA GPUs requiring a legacy driver
un nvidia-libopencl1 (no description available)
un nvidia-libopencl1-dev (no description available)
ii nvidia-modprobe 460.27.04-1 amd64 utility to load NVIDIA kernel modules and create device nodes
un nvidia-nonglvnd-vulkan-common (no description available)
un nvidia-nonglvnd-vulkan-icd (no description available)
un nvidia-opencl-dev (no description available)
un nvidia-opencl-icd (no description available)
un nvidia-openjdk-8-jre (no description available)
ii nvidia-persistenced 450.57-1 amd64 daemon to maintain persistent software state in the NVIDIA driver
ii nvidia-profiler 11.1.1-3 amd64 NVIDIA Profiler for CUDA and OpenCL
ii nvidia-settings 450.80.02-1+b1 amd64 tool for configuring the NVIDIA graphics driver
un nvidia-settings-gtk-450.80.02 (no description available)
ii nvidia-smi 450.80.02-2 amd64 NVIDIA System Management Interface
ii nvidia-support 20151021+12 amd64 NVIDIA binary graphics driver support files
un nvidia-tesla-418-vdpau-driver (no description available)
un nvidia-tesla-418-vulkan-icd (no description available)
un nvidia-tesla-440-vdpau-driver (no description available)
un nvidia-tesla-440-vulkan-icd (no description available)
un nvidia-tesla-450-driver (no description available)
un nvidia-tesla-450-vulkan-icd (no description available)
un nvidia-tesla-alternative (no description available)
ii nvidia-vdpau-driver:amd64 450.80.02-2 amd64 Video Decode and Presentation API for Unix - NVIDIA driver
ii nvidia-visual-profiler 11.1.1-3 amd64 NVIDIA Visual Profiler for CUDA and OpenCL
ii nvidia-vulkan-common 450.80.02-2 amd64 NVIDIA Vulkan driver - common files
ii nvidia-vulkan-icd:amd64 450.80.02-2 amd64 NVIDIA Vulkan installable client driver (ICD)
ii nvidia-vulkan-icd:i386 450.80.02-2 i386 NVIDIA Vulkan installable client driver (ICD)
un nvidia-vulkan-icd-any (no description available)
ii xserver-xorg-video-nvidia 450.80.02-2 amd64 NVIDIA binary Xorg driver
un xserver-xorg-video-nvidia-any (no description available)
un xserver-xorg-video-nvidia-legacy-304xx (no description available)
version: 1.3.1 build date: 2020-12-14T14:18+00:00 build revision: ac02636a318fe7dcc71eaeb3cc55d0c8541c1072 build compiler: x86_64-linux-gnu-gcc-8 8.3.0 build platform: x86_64 build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
docker run --rm --gpus all nvidia/cuda:11.0-base-ubuntu20.04 nvidia-smi