Closed JingL1014 closed 12 months ago
I'm still stuck on this problem.
* First, I tried the @Winand solition to reinstall `nvidia-docker2` after changing the priority to `Pin: origin nvidia.github.io`, but it didn't work, * Then, the @mathisc "dirty" solution didn't work for me either. * I also tried with [Tensorman](https://support.system76.com/articles/tensorman/) (utility developed by Pop OS to run the Nvidia GPU with Tensoflow) but got the same docker error:
docker: Error response from daemon: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.
PopOS: 22.04 Nvidia Driver: 525.60.111 CUDA version: 12.0 Docker version: 20.10.12
Any new idea that may help?
I have the same exact system and am experiencing this problem as well.
Followup: I was able to get the following output after overwriting the sources.list again from the installation guide with:
distribution=$(. /etc/os-release;echo ubuntu22.04) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
(notice the hardcoded ubuntu22.04)
Then I installed nvidia-docker2 and restarted docker
sudo apt-get install nvidia-docker2
Successful Output:
gmacmillan@pop-os:~$ sudo docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu20.04 nvidia-smi
Fri Jan 20 22:21:17 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.78.01 Driver Version: 525.78.01 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:2F:00.0 On | Off |
| 0% 40C P8 13W / 450W | 509MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
@fgoodwin @intrainepha @sandman works with 1.11.0. I've removed nvidia-docker2 and its dependencies and then reinstalled again.
/etc/apt/preferences.d/nvidia-docker-pin-1002
:Package: * Pin: origin nvidia.github.io Pin-Priority: 1002
sudo apt remove nvidia-docker2
sudo apt autoremove
to remove also libnvidia-container-tools libnvidia-container1 nvidia-container-toolkit nvidia-container-toolkit-basesudo apt install nvidia-docker2
sudo systemctl restart docker
After multiple frustrating attempts to follow this advice, I realised the issue was sudo apt autoremove
did not actually remove the offending libraries. Once I manually removed them with sudo apt remove libnvidia-container-tools libnvidia-container1 nvidia-container-toolkit nvidia-container-toolkit-base
, all was good in the world. :)
For anybody out there using a TUXEDO with TUXEDO OS , I fixed mine by simply adding the nvidia libnvidia container repo and doing an apt update && apt upgrade
. Note that it's without installing anything new. Just upgrading using the new repository.
So with one command:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list \
&& \
sudo apt-get update && sudo apt upgrade -y
Thanks @criadoperez. As you pointed out, please refer to the updated installation documenation
Where nvidia-container-toolkit
is the top-level package.
Please create an issue against https://github.com/NVIDIA/nvidia-container-toolkit if there are still problems.
The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.
Also, before reporting a new issue, please make sure that:
1. Issue or feature description
I am following the instruction on github to install nvidia-docker on Ubuntu20.04 but failed with the following error. Could you help me to identify the problem? Thank you!
sudo apt-get update Hit:1 https://nvidia.github.io/libnvidia-container/stable/ubuntu20.04/amd64 InRelease Hit:2 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu20.04/amd64 InRelease Hit:3 https://nvidia.github.io/nvidia-docker/ubuntu20.04/amd64 InRelease Get:4 https://download.docker.com/linux/ubuntu focal InRelease [36.2 kB] Hit:5 http://security.ubuntu.com/ubuntu focal-security InRelease Hit:6 http://archive.lambdalabs.com/ubuntu focal InRelease Hit:7 http://archive.ubuntu.com/ubuntu focal InRelease Hit:8 http://archive.ubuntu.com/ubuntu focal-updates InRelease Hit:9 http://archive.ubuntu.com/ubuntu focal-backports InRelease Fetched 36.2 kB in 1s (46.2 kB/s) Reading package lists... Done
sudo apt-get install -y nvidia-docker2 Reading package lists... Done Building dependency tree Reading state information... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation:
The following packages have unmet dependencies: nvidia-docker2 : Depends: nvidia-container-runtime (>= 3.4.0) but it is not going to be installed E: Unable to correct problems, you have held broken packages.
2. Steps to reproduce the issue
sudo apt-get install -y nvidia-docker2
3. Information to attach (optional if deemed irrelevant)
nvidia-container-cli -k -d /dev/tty info
I0923 20:39:55.953720 464021 nvc.c:282] initializing library context (version=1.2.0, build=) I0923 20:39:55.953761 464021 nvc.c:256] using root / I0923 20:39:55.953766 464021 nvc.c:257] using ldcache /etc/ld.so.cache I0923 20:39:55.953770 464021 nvc.c:258] using unprivileged user 4163:4163 I0923 20:39:55.953786 464021 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL) I0923 20:39:55.953881 464021 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment W0923 20:39:55.956568 464022 nvc.c:187] failed to set inheritable capabilities W0923 20:39:55.956616 464022 nvc.c:188] skipping kernel modules load due to failure I0923 20:39:55.956875 464023 driver.c:101] starting driver service I0923 20:39:55.959606 464021 nvc_info.c:679] requesting driver information with '' I0923 20:39:55.960768 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.450.57 I0923 20:39:55.960809 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.450.57 I0923 20:39:55.960831 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.450.57 I0923 20:39:55.960854 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.450.57 I0923 20:39:55.960889 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.450.57 I0923 20:39:55.960923 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.450.57 I0923 20:39:55.960945 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.450.57 I0923 20:39:55.960965 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.450.57 I0923 20:39:55.961000 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.450.57 I0923 20:39:55.961033 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.450.57 I0923 20:39:55.961054 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.450.57 I0923 20:39:55.961074 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.450.57 I0923 20:39:55.961095 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.450.57 I0923 20:39:55.961128 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.450.57 I0923 20:39:55.961161 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.450.57 I0923 20:39:55.961182 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.450.57 I0923 20:39:55.961203 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.450.57 I0923 20:39:55.961235 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.450.57 I0923 20:39:55.961257 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.450.57 I0923 20:39:55.961295 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.450.57 I0923 20:39:55.961534 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.450.57 I0923 20:39:55.961646 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.450.57 I0923 20:39:55.961669 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.450.57 I0923 20:39:55.961692 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.450.57 I0923 20:39:55.961716 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.450.57 I0923 20:39:55.961757 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-tls.so.450.57 I0923 20:39:55.961790 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-ptxjitcompiler.so.450.57 I0923 20:39:55.961827 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-opticalflow.so.450.57 I0923 20:39:55.961864 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-opencl.so.450.57 I0923 20:39:55.961887 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-ml.so.450.57 I0923 20:39:55.961923 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-ifr.so.450.57 I0923 20:39:55.961957 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-glvkspirv.so.450.57 I0923 20:39:55.961989 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-glsi.so.450.57 I0923 20:39:55.962009 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-glcore.so.450.57 I0923 20:39:55.962032 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-fbc.so.450.57 I0923 20:39:55.962071 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-encode.so.450.57 I0923 20:39:55.962105 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-eglcore.so.450.57 I0923 20:39:55.962125 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-compiler.so.450.57 I0923 20:39:55.962146 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-allocator.so.450.57 I0923 20:39:55.962182 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvcuvid.so.450.57 I0923 20:39:55.962229 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libcuda.so.450.57 I0923 20:39:55.962272 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libGLX_nvidia.so.450.57 I0923 20:39:55.962295 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libGLESv2_nvidia.so.450.57 I0923 20:39:55.962318 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libGLESv1_CM_nvidia.so.450.57 I0923 20:39:55.962340 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libEGL_nvidia.so.450.57 W0923 20:39:55.962361 464021 nvc_info.c:349] missing library libnvidia-fatbinaryloader.so W0923 20:39:55.962366 464021 nvc_info.c:349] missing library libvdpau_nvidia.so W0923 20:39:55.962373 464021 nvc_info.c:353] missing compat32 library libnvidia-cfg.so W0923 20:39:55.962379 464021 nvc_info.c:353] missing compat32 library libnvidia-fatbinaryloader.so W0923 20:39:55.962384 464021 nvc_info.c:353] missing compat32 library libnvidia-ngx.so W0923 20:39:55.962389 464021 nvc_info.c:353] missing compat32 library libvdpau_nvidia.so W0923 20:39:55.962395 464021 nvc_info.c:353] missing compat32 library libnvidia-rtcore.so W0923 20:39:55.962400 464021 nvc_info.c:353] missing compat32 library libnvoptix.so W0923 20:39:55.962407 464021 nvc_info.c:353] missing compat32 library libnvidia-cbl.so I0923 20:39:55.968551 464021 nvc_info.c:275] selecting /usr/bin/nvidia-smi I0923 20:39:55.968574 464021 nvc_info.c:275] selecting /usr/bin/nvidia-debugdump I0923 20:39:55.968597 464021 nvc_info.c:275] selecting /usr/bin/nvidia-persistenced I0923 20:39:55.968612 464021 nvc_info.c:275] selecting /usr/bin/nvidia-cuda-mps-control I0923 20:39:55.968631 464021 nvc_info.c:275] selecting /usr/bin/nvidia-cuda-mps-server I0923 20:39:55.968652 464021 nvc_info.c:437] listing device /dev/nvidiactl I0923 20:39:55.968657 464021 nvc_info.c:437] listing device /dev/nvidia-uvm I0923 20:39:55.968663 464021 nvc_info.c:437] listing device /dev/nvidia-uvm-tools I0923 20:39:55.968667 464021 nvc_info.c:437] listing device /dev/nvidia-modeset I0923 20:39:55.968695 464021 nvc_info.c:316] listing ipc /run/nvidia-persistenced/socket W0923 20:39:55.968712 464021 nvc_info.c:320] missing ipc /tmp/nvidia-mps I0923 20:39:55.968717 464021 nvc_info.c:744] requesting device information with '' I0923 20:39:55.975153 464021 nvc_info.c:627] listing device /dev/nvidia0 (GPU-b4284e5d-adf4-2a5e-69dd-f53c99fc475d at 00000000:01:00.0) I0923 20:39:55.981478 464021 nvc_info.c:627] listing device /dev/nvidia1 (GPU-c2e07576-ea0a-33b0-1622-f8c2132c2086 at 00000000:21:00.0) I0923 20:39:55.988026 464021 nvc_info.c:627] listing device /dev/nvidia2 (GPU-ce68be3f-afa6-1eb5-a43c-27640ca76732 at 00000000:4b:00.0) I0923 20:39:55.994670 464021 nvc_info.c:627] listing device /dev/nvidia3 (GPU-b74b3210-8285-2858-0bd7-5fb7e2d40cba at 00000000:4c:00.0) NVRM version: 450.57 CUDA version: 11.0
Device Index: 0 Device Minor: 0 Model: Quadro RTX 6000 Brand: Quadro GPU UUID: GPU-b4284e5d-adf4-2a5e-69dd-f53c99fc475d Bus Location: 00000000:01:00.0 Architecture: 7.5
Device Index: 1 Device Minor: 1 Model: Quadro RTX 6000 Brand: Quadro GPU UUID: GPU-c2e07576-ea0a-33b0-1622-f8c2132c2086 Bus Location: 00000000:21:00.0 Architecture: 7.5
Device Index: 2 Device Minor: 2 Model: Quadro RTX 6000 Brand: Quadro GPU UUID: GPU-ce68be3f-afa6-1eb5-a43c-27640ca76732 Bus Location: 00000000:4b:00.0 Architecture: 7.5
Device Index: 3 Device Minor: 3 Model: Quadro RTX 6000 Brand: Quadro GPU UUID: GPU-b74b3210-8285-2858-0bd7-5fb7e2d40cba Bus Location: 00000000:4c:00.0 Architecture: 7.5 I0923 20:39:55.994743 464021 nvc.c:337] shutting down library context I0923 20:39:55.995575 464023 driver.c:156] terminating driver service I0923 20:39:55.995902 464021 driver.c:196] driver service terminated successfully
[x] Kernel version from
uname -a
Linux mlrgpu07 5.4.0-47-generic #51-Ubuntu SMP Fri Sep 4 19:50:52 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux[ ] Any relevant kernel output lines from
dmesg
[x] Driver information from
nvidia-smi -a
Driver Version : 450.57 CUDA Version : 11.0[x] Docker version from
docker version
Client: Docker Engine - Community Version: 19.03.13 API version: 1.40 Go version: go1.13.15 Git commit: 4484c46d9d Built: Wed Sep 16 17:02:52 2020 OS/Arch: linux/amd64 Experimental: false
Server: Docker Engine - Community Engine: Version: 19.03.13 API version: 1.40 (minimum version 1.12) Go version: go1.13.15 Git commit: 4484c46d9d Built: Wed Sep 16 17:01:20 2020 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.3.7 GitCommit: 8fba4e9a7d01810a393d5d25a3621dc101981175 runc: Version: 1.0.0-rc10 GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd docker-init: Version: 0.18.0 GitCommit: fec3683
dpkg -l '*nvidia*'
orrpm -qa '*nvidia*'
nvidia-container-cli -V
version: 1.2.0 build date: 2020-07-09T02:45+00:00 build revision: build compiler: gcc-5 5.4.0 20160609 build platform: x86_64 build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -Wdate-time -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections -Wl,-Bsymbolic-functions -Wl,-z,relro