NVIDIA / nvidia-docker

Build and run Docker containers leveraging NVIDIA GPUs
Apache License 2.0
17.19k stars 2.03k forks source link

Error: nvidia-docker2 : Depends: nvidia-container-runtime (>= 3.4.0) #1388

Closed JingL1014 closed 9 months ago

JingL1014 commented 3 years ago

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

Also, before reporting a new issue, please make sure that:


1. Issue or feature description

I am following the instruction on github to install nvidia-docker on Ubuntu20.04 but failed with the following error. Could you help me to identify the problem? Thank you!

sudo apt-get update Hit:1 https://nvidia.github.io/libnvidia-container/stable/ubuntu20.04/amd64 InRelease Hit:2 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu20.04/amd64 InRelease Hit:3 https://nvidia.github.io/nvidia-docker/ubuntu20.04/amd64 InRelease Get:4 https://download.docker.com/linux/ubuntu focal InRelease [36.2 kB] Hit:5 http://security.ubuntu.com/ubuntu focal-security InRelease Hit:6 http://archive.lambdalabs.com/ubuntu focal InRelease Hit:7 http://archive.ubuntu.com/ubuntu focal InRelease Hit:8 http://archive.ubuntu.com/ubuntu focal-updates InRelease Hit:9 http://archive.ubuntu.com/ubuntu focal-backports InRelease Fetched 36.2 kB in 1s (46.2 kB/s) Reading package lists... Done

sudo apt-get install -y nvidia-docker2 Reading package lists... Done Building dependency tree Reading state information... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation:

The following packages have unmet dependencies: nvidia-docker2 : Depends: nvidia-container-runtime (>= 3.4.0) but it is not going to be installed E: Unable to correct problems, you have held broken packages.

2. Steps to reproduce the issue

sudo apt-get install -y nvidia-docker2

3. Information to attach (optional if deemed irrelevant)

I0923 20:39:55.953720 464021 nvc.c:282] initializing library context (version=1.2.0, build=) I0923 20:39:55.953761 464021 nvc.c:256] using root / I0923 20:39:55.953766 464021 nvc.c:257] using ldcache /etc/ld.so.cache I0923 20:39:55.953770 464021 nvc.c:258] using unprivileged user 4163:4163 I0923 20:39:55.953786 464021 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL) I0923 20:39:55.953881 464021 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment W0923 20:39:55.956568 464022 nvc.c:187] failed to set inheritable capabilities W0923 20:39:55.956616 464022 nvc.c:188] skipping kernel modules load due to failure I0923 20:39:55.956875 464023 driver.c:101] starting driver service I0923 20:39:55.959606 464021 nvc_info.c:679] requesting driver information with '' I0923 20:39:55.960768 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.450.57 I0923 20:39:55.960809 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.450.57 I0923 20:39:55.960831 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.450.57 I0923 20:39:55.960854 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.450.57 I0923 20:39:55.960889 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.450.57 I0923 20:39:55.960923 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.450.57 I0923 20:39:55.960945 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.450.57 I0923 20:39:55.960965 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.450.57 I0923 20:39:55.961000 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.450.57 I0923 20:39:55.961033 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.450.57 I0923 20:39:55.961054 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.450.57 I0923 20:39:55.961074 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.450.57 I0923 20:39:55.961095 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.450.57 I0923 20:39:55.961128 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.450.57 I0923 20:39:55.961161 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.450.57 I0923 20:39:55.961182 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.450.57 I0923 20:39:55.961203 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.450.57 I0923 20:39:55.961235 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.450.57 I0923 20:39:55.961257 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.450.57 I0923 20:39:55.961295 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.450.57 I0923 20:39:55.961534 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.450.57 I0923 20:39:55.961646 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.450.57 I0923 20:39:55.961669 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.450.57 I0923 20:39:55.961692 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.450.57 I0923 20:39:55.961716 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.450.57 I0923 20:39:55.961757 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-tls.so.450.57 I0923 20:39:55.961790 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-ptxjitcompiler.so.450.57 I0923 20:39:55.961827 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-opticalflow.so.450.57 I0923 20:39:55.961864 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-opencl.so.450.57 I0923 20:39:55.961887 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-ml.so.450.57 I0923 20:39:55.961923 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-ifr.so.450.57 I0923 20:39:55.961957 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-glvkspirv.so.450.57 I0923 20:39:55.961989 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-glsi.so.450.57 I0923 20:39:55.962009 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-glcore.so.450.57 I0923 20:39:55.962032 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-fbc.so.450.57 I0923 20:39:55.962071 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-encode.so.450.57 I0923 20:39:55.962105 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-eglcore.so.450.57 I0923 20:39:55.962125 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-compiler.so.450.57 I0923 20:39:55.962146 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-allocator.so.450.57 I0923 20:39:55.962182 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvcuvid.so.450.57 I0923 20:39:55.962229 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libcuda.so.450.57 I0923 20:39:55.962272 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libGLX_nvidia.so.450.57 I0923 20:39:55.962295 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libGLESv2_nvidia.so.450.57 I0923 20:39:55.962318 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libGLESv1_CM_nvidia.so.450.57 I0923 20:39:55.962340 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libEGL_nvidia.so.450.57 W0923 20:39:55.962361 464021 nvc_info.c:349] missing library libnvidia-fatbinaryloader.so W0923 20:39:55.962366 464021 nvc_info.c:349] missing library libvdpau_nvidia.so W0923 20:39:55.962373 464021 nvc_info.c:353] missing compat32 library libnvidia-cfg.so W0923 20:39:55.962379 464021 nvc_info.c:353] missing compat32 library libnvidia-fatbinaryloader.so W0923 20:39:55.962384 464021 nvc_info.c:353] missing compat32 library libnvidia-ngx.so W0923 20:39:55.962389 464021 nvc_info.c:353] missing compat32 library libvdpau_nvidia.so W0923 20:39:55.962395 464021 nvc_info.c:353] missing compat32 library libnvidia-rtcore.so W0923 20:39:55.962400 464021 nvc_info.c:353] missing compat32 library libnvoptix.so W0923 20:39:55.962407 464021 nvc_info.c:353] missing compat32 library libnvidia-cbl.so I0923 20:39:55.968551 464021 nvc_info.c:275] selecting /usr/bin/nvidia-smi I0923 20:39:55.968574 464021 nvc_info.c:275] selecting /usr/bin/nvidia-debugdump I0923 20:39:55.968597 464021 nvc_info.c:275] selecting /usr/bin/nvidia-persistenced I0923 20:39:55.968612 464021 nvc_info.c:275] selecting /usr/bin/nvidia-cuda-mps-control I0923 20:39:55.968631 464021 nvc_info.c:275] selecting /usr/bin/nvidia-cuda-mps-server I0923 20:39:55.968652 464021 nvc_info.c:437] listing device /dev/nvidiactl I0923 20:39:55.968657 464021 nvc_info.c:437] listing device /dev/nvidia-uvm I0923 20:39:55.968663 464021 nvc_info.c:437] listing device /dev/nvidia-uvm-tools I0923 20:39:55.968667 464021 nvc_info.c:437] listing device /dev/nvidia-modeset I0923 20:39:55.968695 464021 nvc_info.c:316] listing ipc /run/nvidia-persistenced/socket W0923 20:39:55.968712 464021 nvc_info.c:320] missing ipc /tmp/nvidia-mps I0923 20:39:55.968717 464021 nvc_info.c:744] requesting device information with '' I0923 20:39:55.975153 464021 nvc_info.c:627] listing device /dev/nvidia0 (GPU-b4284e5d-adf4-2a5e-69dd-f53c99fc475d at 00000000:01:00.0) I0923 20:39:55.981478 464021 nvc_info.c:627] listing device /dev/nvidia1 (GPU-c2e07576-ea0a-33b0-1622-f8c2132c2086 at 00000000:21:00.0) I0923 20:39:55.988026 464021 nvc_info.c:627] listing device /dev/nvidia2 (GPU-ce68be3f-afa6-1eb5-a43c-27640ca76732 at 00000000:4b:00.0) I0923 20:39:55.994670 464021 nvc_info.c:627] listing device /dev/nvidia3 (GPU-b74b3210-8285-2858-0bd7-5fb7e2d40cba at 00000000:4c:00.0) NVRM version: 450.57 CUDA version: 11.0

Device Index: 0 Device Minor: 0 Model: Quadro RTX 6000 Brand: Quadro GPU UUID: GPU-b4284e5d-adf4-2a5e-69dd-f53c99fc475d Bus Location: 00000000:01:00.0 Architecture: 7.5

Device Index: 1 Device Minor: 1 Model: Quadro RTX 6000 Brand: Quadro GPU UUID: GPU-c2e07576-ea0a-33b0-1622-f8c2132c2086 Bus Location: 00000000:21:00.0 Architecture: 7.5

Device Index: 2 Device Minor: 2 Model: Quadro RTX 6000 Brand: Quadro GPU UUID: GPU-ce68be3f-afa6-1eb5-a43c-27640ca76732 Bus Location: 00000000:4b:00.0 Architecture: 7.5

Device Index: 3 Device Minor: 3 Model: Quadro RTX 6000 Brand: Quadro GPU UUID: GPU-b74b3210-8285-2858-0bd7-5fb7e2d40cba Bus Location: 00000000:4c:00.0 Architecture: 7.5 I0923 20:39:55.994743 464021 nvc.c:337] shutting down library context I0923 20:39:55.995575 464023 driver.c:156] terminating driver service I0923 20:39:55.995902 464021 driver.c:196] driver service terminated successfully

Client: Docker Engine - Community Version: 19.03.13 API version: 1.40 Go version: go1.13.15 Git commit: 4484c46d9d Built: Wed Sep 16 17:02:52 2020 OS/Arch: linux/amd64 Experimental: false

Server: Docker Engine - Community Engine: Version: 19.03.13 API version: 1.40 (minimum version 1.12) Go version: go1.13.15 Git commit: 4484c46d9d Built: Wed Sep 16 17:01:20 2020 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.3.7 GitCommit: 8fba4e9a7d01810a393d5d25a3621dc101981175 runc: Version: 1.0.0-rc10 GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd docker-init: Version: 0.18.0 GitCommit: fec3683

klueska commented 3 years ago

It looks like you may have an old nvidia container stack installed (since you are able to run nvidia-container-cli successfully, but it was never installed as part of the current nvidia-docker2 installation).

Can you try uninstalling libnvidia-container1 (and in doing so, all of the things that depend on it). Then try reinstalling nvidia-docker2 again.

This shouldn't be necessary, but it's worth a shot.

Also, are you on a DGX machine? If so, this may be relevant: https://github.com/NVIDIA/nvidia-docker/issues/1355#issuecomment-663703304

AlexMikhalev commented 3 years ago

@klueska I hit the same issue on Juno laptop, so it's not hardware-specific. Following instruction and fetching https://nvidia.github.io/nvidia-docker/ubuntu20.04/nvidia-docker.list results in

cat /etc/apt/sources.list.d/nvidia-docker.list
deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
#deb https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/$(ARCH) /
#deb https://nvidia.github.io/nvidia-container-runtime/experimental/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH) /

which point into 18.04 repo.

klueska commented 3 years ago

@AlexMikhalev the fact that nvidia-docker.list contains references to ubuntu 18.04 is not an issue (in fact the 20.04 repo is just a symlink to 18.04).

Are you also seeing problems with though:

nvidia-docker2 : Depends: nvidia-container-runtime (>= 3.4.0) but it is not going to be installed

I have tested it multiple times in various environments and am not able to reproduce the issue.

JingL1014 commented 3 years ago

@klueska I uninstalled the libnvidia-container1, but I still got the same error. I am not on a DGX machine.

This is the list of Nvidia packages from "dpkg -l 'nvidia'"

||/ Name                             Version                 Architecture Description
un  libgldispatch0-nvidia                                     (no description available)
ii  libnvidia-cfg1-450:amd64         450.57-0lambda0~20.04.1 amd64        NVIDIA binary OpenGL/GLX configuration library
un  libnvidia-cfg1-any                                        (no description available)
un  libnvidia-common                                          (no description available)
ii  libnvidia-common-450             450.57-0lambda0~20.04.1 all          Shared files used by the NVIDIA libraries
ii  libnvidia-compute-450:amd64      450.57-0lambda0~20.04.1 amd64        NVIDIA libcompute package
ii  libnvidia-compute-450:i386       450.57-0lambda0~20.04.1 i386         NVIDIA libcompute package
un  libnvidia-decode                                          (no description available)
ii  libnvidia-decode-450:amd64       450.57-0lambda0~20.04.1 amd64        NVIDIA Video Decoding runtime libraries
ii  libnvidia-decode-450:i386        450.57-0lambda0~20.04.1 i386         NVIDIA Video Decoding runtime libraries
un  libnvidia-encode                                          (no description available)
ii  libnvidia-encode-450:amd64       450.57-0lambda0~20.04.1 amd64        NVENC Video Encoding runtime library
ii  libnvidia-encode-450:i386        450.57-0lambda0~20.04.1 i386         NVENC Video Encoding runtime library
un  libnvidia-extra                                           (no description available)
ii  libnvidia-extra-450:amd64        450.57-0lambda0~20.04.1 amd64        Extra libraries for the NVIDIA driver
ii  libnvidia-extra-450:i386         450.57-0lambda0~20.04.1 i386         Extra libraries for the NVIDIA driver
un  libnvidia-fbc1                                            (no description available)
ii  libnvidia-fbc1-450:amd64         450.57-0lambda0~20.04.1 amd64        NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-fbc1-450:i386          450.57-0lambda0~20.04.1 i386         NVIDIA OpenGL-based Framebuffer Capture runtime library
un  libnvidia-gl                                              (no description available)
ii  libnvidia-gl-450:amd64           450.57-0lambda0~20.04.1 amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  libnvidia-gl-450:i386            450.57-0lambda0~20.04.1 i386         NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
un  libnvidia-ifr1                                            (no description available)
ii  libnvidia-ifr1-450:amd64         450.57-0lambda0~20.04.1 amd64        NVIDIA OpenGL-based Inband Frame Readback runtime library
ii  libnvidia-ifr1-450:i386          450.57-0lambda0~20.04.1 i386         NVIDIA OpenGL-based Inband Frame Readback runtime library
ii  libnvidia-ml-dev                 10.2.89-0lambda2        amd64        NVIDIA Management Library (NVML) development package
un  libnvidia-ml1                                             (no description available)
un  nvidia-304                                                (no description available)
un  nvidia-340                                                (no description available)
un  nvidia-384                                                (no description available)
un  nvidia-common                                             (no description available)
un  nvidia-compute-utils                                      (no description available)
ii  nvidia-compute-utils-450         450.57-0lambda0~20.04.1 amd64        NVIDIA compute utilities
ii  nvidia-cuda-dev:amd64            10.2.89-0lambda2        amd64        CUDA development files
ii  nvidia-cuda-doc                  10.2.89-0lambda2        all          CUDA toolkit documentation
ii  nvidia-cuda-gdb                  10.2.89-0lambda2        amd64        CUDA Debugger
ii  nvidia-cuda-toolkit              10.2.89-0lambda2        amd64        CUDA development toolkit
ii  nvidia-dkms-450                  450.57-0lambda0~20.04.1 amd64        NVIDIA DKMS package
un  nvidia-dkms-kernel                                        (no description available)
ii  nvidia-driver-440                450.57-0lambda0~20.04.1 amd64        Transitional package for nvidia-driver-450
ii  nvidia-driver-450                450.57-0lambda0~20.04.1 amd64        NVIDIA driver metapackage
un  nvidia-driver-binary                                      (no description available)
un  nvidia-driver-meta                                        (no description available)
un  nvidia-kernel-common                                      (no description available)
ii  nvidia-kernel-common-450         450.57-0lambda0~20.04.1 amd64        Shared files used with the kernel module
un  nvidia-kernel-source                                      (no description available)
ii  nvidia-kernel-source-450         450.57-0lambda0~20.04.1 amd64        NVIDIA kernel source package
un  nvidia-legacy-304xx-vdpau-driver                          (no description available)
un  nvidia-legacy-340xx-vdpau-driver                          (no description available)
un  nvidia-libopencl1                                         (no description available)
un  nvidia-libopencl1-dev                                     (no description available)
un  nvidia-opencl-icd                                         (no description available)
un  nvidia-persistenced                                       (no description available)
un  nvidia-prime                                              (no description available)
ii  nvidia-profiler                  10.2.89-0lambda2        amd64        NVIDIA CUDA profiler
ii  nvidia-settings                  450.57-0lambda1         amd64        Tool for configuring the NVIDIA graphics driver
un  nvidia-settings-binary                                    (no description available)
un  nvidia-smi                                                (no description available)
un  nvidia-utils                                              (no description available)
ii  nvidia-utils-450                 450.57-0lambda0~20.04.1 amd64        NVIDIA driver support binaries
un  nvidia-vdpau-driver                                       (no description available)
ii  xserver-xorg-video-nvidia-450    450.57-0lambda0~20.04.1 amd64        NVIDIA binary Xorg driver
klueska commented 3 years ago

@JingL1014 Those are just the packages you have installed.

Can you show me the list of packages available:

sudo apt-cache madison nvidia-container-runtime

Mine shows:

nvidia-container-runtime |    3.4.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.3.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.2.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.1.4-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.1.3-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.1.2-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.1.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.7-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.6-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.5-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.5-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.4-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.3-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.2-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.06.3-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.06.2-2 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.06.2-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.06.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.06.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.03.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker17.12.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
JingL1014 commented 3 years ago

@klueska

Mine shows:


nvidia-container-runtime |    3.4.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.3.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.2.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.1.4-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.1.3-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.1.2-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.1.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.7-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.6-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.5-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.5-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.4-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.3-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.2-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.06.3-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.06.2-2 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.06.2-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.06.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.06.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.03.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker17.12.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
klueska commented 3 years ago

So it clearly shows version 3.4.0-1 being available. It is strange that you would get this error then:

nvidia-docker2 : Depends: nvidia-container-runtime (>= 3.4.0) but it is not going to be installed

Can you manually install nvidia-container-runtime and see which version gets installed?:


sudo apt-get install -y nvidia-container-runtime
AlexMikhalev commented 3 years ago
sudo apt-get install -y nvidia-docker2
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies.
 nvidia-docker2 : Depends: nvidia-container-runtime (>= 3.4.0) but 3.1.4-0pop1~1569270714~20.04~2ea45f8 is to be installed
E: Unable to correct problems, you have held broken packages.

Trying to install container-runtime

sudo apt-get install -y nvidia-container-runtime
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libnvidia-cfg1-440 libnvidia-decode-440 libnvidia-decode-440:i386 libnvidia-encode-440
  libnvidia-encode-440:i386 libnvidia-extra-440 libnvidia-fbc1-440 libnvidia-fbc1-440:i386
  libnvidia-gl-440 libnvidia-ifr1-440 libnvidia-ifr1-440:i386 libxnvctrl0
  linux-headers-5.4.0-47 linux-headers-5.4.0-47-generic linux-image-5.4.0-47-generic
  linux-modules-5.4.0-47-generic linux-modules-extra-5.4.0-47-generic linux-tools-5.4.0-47
  linux-tools-5.4.0-47-generic mousetweaks nvidia-compute-utils-440 nvidia-kernel-source-440
  nvidia-settings nvidia-utils-440 screen-resolution-extra xserver-xorg-video-nvidia-440
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  libnvidia-container-tools libnvidia-container1 libtirpc-common libtirpc3
  nvidia-container-toolkit
The following NEW packages will be installed
  libnvidia-container-tools libnvidia-container1 libtirpc-common libtirpc3
  nvidia-container-runtime nvidia-container-toolkit
0 to upgrade, 6 to newly install, 0 to remove and 1 not to upgrade.
Need to get 1,807 kB of archives.
After this operation, 6,979 kB of additional disk space will be used.
Get:1 http://ppa.launchpad.net/system76/pop/ubuntu focal/main amd64 libnvidia-container1 amd64 1.0.6-1pop1~1571281295~20.04~862e228 [57.2 kB]
Get:2 http://us.archive.ubuntu.com/ubuntu focal/main amd64 libtirpc-common all 1.2.5-1 [7,632 B]
Get:3 http://ppa.launchpad.net/system76/pop/ubuntu focal/main amd64 libnvidia-container-tools amd64 1.0.6-1pop1~1571281295~20.04~862e228 [14.5 kB]
Get:4 http://us.archive.ubuntu.com/ubuntu focal/main amd64 libtirpc3 amd64 1.2.5-1 [77.2 kB]
Get:5 http://ppa.launchpad.net/system76/pop/ubuntu focal/main amd64 nvidia-container-toolkit amd64 1.0.5-0pop1~1569270707~20.04~17cd54f [811 kB]
Get:6 http://ppa.launchpad.net/system76/pop/ubuntu focal/main amd64 nvidia-container-runtime amd64 3.1.4-0pop1~1569270714~20.04~2ea45f8 [840 kB]
Fetched 1,807 kB in 1s (3,362 kB/s)               
Selecting previously unselected package libtirpc-common.
(Reading database ... 255334 files and directories currently installed.)
Preparing to unpack .../0-libtirpc-common_1.2.5-1_all.deb ...
Unpacking libtirpc-common (1.2.5-1) ...
Selecting previously unselected package libtirpc3:amd64.
Preparing to unpack .../1-libtirpc3_1.2.5-1_amd64.deb ...
Unpacking libtirpc3:amd64 (1.2.5-1) ...
Selecting previously unselected package libnvidia-container1:amd64.
Preparing to unpack .../2-libnvidia-container1_1.0.6-1pop1~1571281295~20.04~862e228_amd64.deb ...
Unpacking libnvidia-container1:amd64 (1.0.6-1pop1~1571281295~20.04~862e228) ...
Selecting previously unselected package libnvidia-container-tools.
Preparing to unpack .../3-libnvidia-container-tools_1.0.6-1pop1~1571281295~20.04~862e228_amd64.deb ...
Unpacking libnvidia-container-tools (1.0.6-1pop1~1571281295~20.04~862e228) ...
Selecting previously unselected package nvidia-container-toolkit.
Preparing to unpack .../4-nvidia-container-toolkit_1.0.5-0pop1~1569270707~20.04~17cd54f_amd64.deb ...
Unpacking nvidia-container-toolkit (1.0.5-0pop1~1569270707~20.04~17cd54f) ...
Selecting previously unselected package nvidia-container-runtime.
Preparing to unpack .../5-nvidia-container-runtime_3.1.4-0pop1~1569270714~20.04~2ea45f8_amd64.deb ...
Unpacking nvidia-container-runtime (3.1.4-0pop1~1569270714~20.04~2ea45f8) ...
Setting up libtirpc-common (1.2.5-1) ...
Setting up libtirpc3:amd64 (1.2.5-1) ...
Setting up libnvidia-container1:amd64 (1.0.6-1pop1~1571281295~20.04~862e228) ...
Setting up libnvidia-container-tools (1.0.6-1pop1~1571281295~20.04~862e228) ...
Setting up nvidia-container-toolkit (1.0.5-0pop1~1569270707~20.04~17cd54f) ...
Setting up nvidia-container-runtime (3.1.4-0pop1~1569270714~20.04~2ea45f8) ...
Processing triggers for man-db (2.9.1-1) ...
Processing triggers for libc-bin (2.31-0ubuntu9.1) ...
(base) ➜  ~ sudo apt-get install -y nvidia-docker2          
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies.
 nvidia-docker2 : Depends: nvidia-container-runtime (>= 3.4.0) but 3.1.4-0pop1~1569270714~20.04~2ea45f8 is to be installed
E: Unable to correct problems, you have held broken packages.
klueska commented 3 years ago

@AlexMikhalev It looks like you have added a ppa repository from system76, which appears to distribute their own builds of the nvidia container stack (independent of the official repos published by NVIDIA). The components they are hosting appear to be a quite a few versions behind the latest.

In addition to that, it appears that they don't actually include nvidia-docker2 as part of their distribution. So when you try and install nvidia-docker2, it pulls the latest from the NVIDIA repos, but then tries to pull the old versions of its dependent components from the system76 ppa repos.

You need to either remove the system76 repo or somehow make the NVIDIA repos higher priority so that it pulls the container stack from them instead of system76.

sebautistam commented 3 years ago

@AlexMikhalev It looks like you have added a ppa repository from system76, which appears to distribute their own builds of the nvidia container stack (independent of the official repos published by NVIDIA). The components they are hosting appear to be a quite a few versions behind the latest.

In addition to that, it appears that they don't actually include nvidia-docker2 as part of their distribution. So when you try and install nvidia-docker2, it pulls the latest from the NVIDIA repos, but then tries to pull the old versions of its dependent components from the system76 ppa repos.

You need to either remove the system76 repo or somehow make the NVIDIA repos higher priority so that it pulls the container stack from them instead of system76.

Hi! I am having the exact same issue. I am using pop-os (based on Ubuntu 20.04m distributed by system76), and I have previously installed and use nvidia-docker2 in machines with the same OS following the normal ubuntu guide (changing the distribution to match the ubuntu version (ubuntu20.04 in the other machines as well).

I get what you are saying about the system76 repos, even though it is weird it worked fine about 2 months ago. Anyway, I could try to make NVIDIA repos higher priority, but I don't know how to do it. Could you please guide me on this?

Thank you.

klueska commented 3 years ago

@sebautistam I don't know the details of what it looks like in pop-os, but I'm guessing you have some files under /etc/apt/preferences.d/ which are prioritizing the system76 repos over all others.

sebautistam commented 3 years ago

There is a file in that directory with this inside:

Package: *
Pin: release o=LP-PPA-system76-pop
Pin-Priority: 1001

Package: *
Pin: release o=LP-PPA-system76-proposed
Pin-Priority: 1001
klueska commented 3 years ago

You will need to adjust these (or add more rules for the nvidia repos) according to this: http://manpages.ubuntu.com/manpages/bionic/man5/apt_preferences.5.html

sebautistam commented 3 years ago

You will need to adjust these (or add more rules for the nvidia repos) according to this: http://manpages.ubuntu.com/manpages/bionic/man5/apt_preferences.5.html

I have modified the preferences file and now it looks like this:

`Package: * Pin: origin nvidia.github.io Pin-Priority: 1001

Package: * Pin: release o=LP-PPA-system76-pop Pin-Priority: 901

Package: * Pin: release o=LP-PPA-system76-proposed Pin-Priority: 901`

And the nvidia-docker2 was successful.

Do you know the name of the package to include in the preferences file, just to be more specific and avoid future problems with other packages?

Thank you!

klueska commented 3 years ago

@sebautistam Sorry, I'm not that familiar with these preference files. I just know they've caused problems for people in the past, so I figured it was related here.

JingL1014 commented 3 years ago

I figured out that in my case the problem is the version 1.2.0+ds-0lambda1 of libnvidia-container1 is automatically installed.

sudo apt-cache madison libnvidia-container1
libnvidia-container1 |    1.3.0-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 | 1.2.0+ds-0lambda1 | http://archive.lambdalabs.com/ubuntu focal/main amd64 Packages
libnvidia-container1 |    1.2.0-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.1.1-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.1.0-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.0.7-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.0.5-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.0.4-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.0.3-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.0.2-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.0.1-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.0.0-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 | 1.0.0~rc.2-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 | 1.0.0~rc.1-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
So I manually installed version 1.3.0-1 and then manually installed other packages using the following commands. Now nvidia-docker is successfully installed. :
sudo apt-get install -y libnvidia-container1=1.3.0-1
sudo apt-get install -y libnvidia-container-tools=1.3.0-1
sudo apt-get install -y nvidia-container-runtime
sudo apt-get install -y nvidia-docker2         
AlexMikhalev commented 3 years ago

@JingL1014 I followed you advice successfully until the last line:

sudo apt-get install -y nvidia-docker2
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies.
 nvidia-docker2 : Depends: nvidia-container-runtime (>= 3.4.0) but 3.1.4-0pop1~1569270714~20.04~2ea45f8 is to be installed
E: Unable to correct problems, you have held broken packages.
JingL1014 commented 3 years ago

@AlexMikhalev It seems that the nvidia-container-runtime of version 3.1.4 is to be installed, probably you can try to manually install version 3.4.0 by running: sudo apt-get install -y nvidia-container-runtime=3.4.0

And then run: sudo apt-get install -y nvidia-docker2

AlexMikhalev commented 3 years ago

Fixed. The issue was the priority of system76 apt repo over Nvidia. @klueska there is a good reason to use System76 drivers over stock Nvidia ones - for example, stock Nvidia drivers don't support external monitor/dual monitor configuration on the laptops. It would be good to be able to pin priority of the relevant packages rather than the whole repo.

ffahmed commented 3 years ago

Hi, @AlexMikhalev , which file did you change the repo priority? /etc/apt/sources.list ? I could not find system76 entry there

ffahmed commented 3 years ago

I tried to install nvidia-container-runtime for same issue. But it asked me to install nvidia-container-toolkit. I tried to install and it said it's already installed. But it did not let me install nvidia-container-runtime again. See below. Can anyone share any solution?

(base) JJteam@lambda-quad:~$ sudo apt-get install -y nvidia-container-runtime Reading package lists... Done Building dependency tree Reading state information... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation:

The following packages have unmet dependencies: nvidia-container-runtime : Depends: nvidia-container-toolkit (>= 1.3.0) but 1.2.0+ds-0lambda0.18.04.1 is to be installed E: Unable to correct problems, you have held broken packages. (base) JJteam@lambda-quad:~$ sudo apt install nvidia-container-toolkit Reading package lists... Done Building dependency tree Reading state information... Done nvidia-container-toolkit is already the newest version (1.2.0+ds-0lambda0.18.04.1). 0 upgraded, 0 newly installed, 0 to remove and 6 not upgraded. (base) JJteam@lambda-quad:~$ sudo apt install nvidia-container-runtime Reading package lists... Done Building dependency tree Reading state information... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation:

The following packages have unmet dependencies: nvidia-container-runtime : Depends: nvidia-container-toolkit (>= 1.3.0) but 1.2.0+ds-0lambda0.18.04.1 is to be installed E: Unable to correct problems, you have held broken packages.

AlexMikhalev commented 3 years ago

@ffahmed I changed priorities in two files:

cat /etc/apt/preferences.d/pop-default-settings 
Package: *
Pin: release o=LP-PPA-system76-pop
Pin-Priority: 901

Package: *
Pin: release o=LP-PPA-system76-proposed
Pin-Priority: 901

and

cat /etc/apt/preferences.d/nvidia-default 
Package: *
Pin: origin nvidia.github.io
Pin-Priority: 1001
ffahmed commented 3 years ago

Thanks @AlexMikhalev for prompt reply!

For me I don't have those files in /etc/apt/preferences.d/ The only file I have is cuda-repository-pin-600 and here is the content.


cat /etc/apt/preferences.d/cuda-repository-pin-600 Package: nsight-compute Pin: origin ubuntu.com Pin-Priority: -1

Package: nsight-systems Pin: origin ubuntu.com Pin-Priority: -1

Package: * Pin: release l=NVIDIA CUDA Pin-Priority: 600

Also for me i think issue is a little different that yours. You were able to install nvidia-container-runtime, but when I try to install it it shows dependency on nvidia-container-toolkit. I tried to install nvidia-container-toolkit and it says it's already installed. So I am kind of stuck here. See below the command and output. I am kind of stuck here, cannot go to older nvidia-docker (1.0) version too. Any help is much appreciated! @AlexMikhalev @klueska.

sudo apt-get install -y nvidia-container-runtime Reading package lists... Done Building dependency tree Reading state information... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation:

The following packages have unmet dependencies: nvidia-container-runtime : Depends: nvidia-container-toolkit (>= 1.3.0) but 1.2.0+ds-0lambda0.18.04.1 is to be installed E: Unable to correct problems, you have held broken packages. (base) JJteam@lambda-quad:~$ sudo apt-get install -y nvidia-container-toolkit Reading package lists... Done Building dependency tree Reading state information... Done nvidia-container-toolkit is already the newest version (1.2.0+ds-0lambda0.18.04.1). 0 upgraded, 0 newly installed, 0 to remove and 6 not upgraded.

klueska commented 3 years ago

@ffahmed What repo is your version 1.2.0+ds-0lambda0.18.04.1 coming from. You need to figure out where that is coming from and why it has higher priority than the ones from the official NVIDIA repos.

ffahmed commented 3 years ago

Thanks @klueska @AlexMikhalev both of you. I had to create preference for nvidia-docker.

vi /etc/apt/preferences.d/nvidia-docker-pin-1002 with content; Package: * Pin: origin nvidia.github.io Pin-Priority: 1002

Then I didn't need to install nvidia-container-toolkit. I was able to directly do sudo apt-get install -y nvidia-docker2

abhidipbhattacharyya commented 3 years ago

Thanks @klueska @AlexMikhalev both of you. I had to create preference for nvidia-docker.

vi /etc/apt/preferences.d/nvidia-docker-pin-1002 with content; Package: * Pin: origin nvidia.github.io Pin-Priority: 1002

Then I didn't need to install nvidia-container-toolkit. I was able to directly do sudo apt-get install -y nvidia-docker2

@ffahmed thank you. Your solution worked for me. You saved my day.

imSrbh commented 2 years ago

Thanks.. @klueska @AlexMikhalev @ffahmed I was facing the same..resolved within no time.

DanielTakeshi commented 2 years ago

Thanks! I think this fixed the issue as well, I use:

cat /etc/apt/preferences.d/nvidia-docker-pin-1002
Package: *
Pin: origin nvidia.github.io
Pin-Priority: 1002
abhidipbhattacharyya commented 2 years ago

For me the following worked- vi /etc/apt/preferences.d/nvidia-docker-pin-1002 I added the content-

Package: * Pin: origin nvidia.github.io Pin-Priority: 1002

then I ran- sudo apt-get install -y nvidia-docker2 sudo systemctl restart docker

sandman commented 2 years ago

This does not solve my issue! I get the same exact error as before:

docker run --gpus 0 -it --shm-size=1024m -e SIZEW=1920 -e SIZEH=1080 -e PASSWD=mypasswd -e BASIC_AUTH_PASSWORD=mypasswd -e NOVNC_ENABLE=true -p 6080:8080 nvidia-egl-desktop-ros2:foxy
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.

I am on PopOS 22.04 and trying to run this image: https://github.com/atinfinity/nvidia-egl-desktop-ros2/blob/main/foxy/Dockerfile

elezar commented 2 years ago

@sandman can you check the version of the NVIDIA Container CLI that you are using: nvidia-container-cli --version?

It may be that you are not using a version that supports cgroupv2.

sandman commented 2 years ago

@elezar My version is:

cli-version: 1.8.0
lib-version: 1.8.0
build date: 2022-02-07T17:42+00:00
build revision: 
build compiler: x86_64-linux-gnu-gcc-11 11.2.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -fPIC -Wdate-time -D_FORTIFY_SOURCE=2 -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -fPIC -g -O2 -ffile-prefix-map=/build/libnvidia-container-QpPgXl/libnvidia-container-1.8.0=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -I/usr/include/tirpc -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro

Does this support cgroupv2?

elezar commented 2 years ago

There was a cgroup-related fix released in v1.8.1 so updating to at least that version (or 1.10.0) is recommended.

sandman commented 2 years ago

Thanks @elezar ! I got it to work by installing 1.10.0

jbartolozzi commented 2 years ago

How did you manually instal 1.10.0? @sandman

sandman commented 2 years ago

@jbartolozzi I have created a gist with the steps: https://gist.github.com/sandman/3777b07f69e117aa8bf1adede26a4e36

adwaykanhere commented 1 year ago

@elezar I'm still having the same issue on Pop-os. I have cli-version = 1.11.0/ and I have modified the default settings in /etc/apt/preferences.d/pop-default-settings as @sandman suggested.

intrainepha commented 1 year ago

@elezar I'm still having the same issue on Pop-os. I have cli-version = 1.11.0/ and I have modified the default settings in /etc/apt/preferences.d/pop-default-settings as @sandman suggested.

Same here, any solution please

fgoodwin commented 1 year ago

@elezar I'm still having the same issue on Pop-os. I have cli-version = 1.11.0/ and I have modified the default settings in /etc/apt/preferences.d/pop-default-settings as @sandman suggested.

Ditto Pop-os 22.04: sudo docker run --rm --gpus all nvidia/cuda:9.0-base nvidia-smi docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.

sandman commented 1 year ago

@adwaykanhere @intrainepha My Gist applies for 1.10.0 (for host PopOs 22.04 and Container running Ubuntu 20.04). I did not test 1.11.0.

Winand commented 1 year ago

@fgoodwin @intrainepha @sandman works with 1.11.0. I've removed nvidia-docker2 and its dependencies and then reinstalled again.

julianschoep commented 1 year ago

@Winand this still does not work for me..

> nvidia-container-cli --version
cli-version: 1.11.0
lib-version: 1.11.0
build date: 2022-09-18T23:16+00:00
build revision: 
build compiler: x86_64-linux-gnu-gcc-11 11.2.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -fPIC -Wdate-time -D_FORTIFY_SOURCE=2 -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -fPIC -g -O2 -ffile-prefix-map=/build/libnvidia-container-CeXONE/libnvidia-container-1.11.0=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -I/usr/include/tirpc -Wl,-zre

And installed:

Setting up nvidia-container-toolkit-base (1.11.0-0pop1~1663593585~22.04~5b13c4c) ...
Setting up libnvidia-container-tools (1.11.0-0pop1~1663542983~22.04~fbd1818) ...
Setting up nvidia-container-toolkit (1.11.0-0pop1~1663593585~22.04~5b13c4c) ...
Setting up nvidia-docker2 (2.11.0-1~1663542535~22.04~0f7519f) ...

Getting the error

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: 
unable to start container process: error during container init: error running hook #0: error running hook: 
exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.
mathisc commented 1 year ago

@Winand the fix you mentionned does not work for me either (cli-version==1.11.0 on PopOS 22.04). As a "temporary" (dirty?) workaround passing GPU devices manually works: no-cgroups = true in /etc/nvidia-container-runtime/config.toml and then running with : docker run --rm --gpus all --privileged -v /dev:/dev nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

Winand commented 1 year ago

@mathisc you have docker-ce or docker desktop?

mathisc commented 1 year ago

@Winand I don't have Docker desktop installed so I would guess I am using docker-ce installed via this procedure : https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository Here is the docker version output :

(base) ➜  ~ docker version                            
Client: Docker Engine - Community
 Version:           20.10.21
 API version:       1.41
 Go version:        go1.18.7
 Git commit:        baeda1f
 Built:             Tue Oct 25 18:01:58 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.21
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.18.7
  Git commit:       3056208
  Built:            Tue Oct 25 17:59:49 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.10
  GitCommit:        770bd0108c32f3fb5c73ae1264f7e503fe7b2661
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
julianschoep commented 1 year ago

@mathisc your "dirty" fixed worked for me, thanks

dubovikmaster commented 1 year ago

@fgoodwin @intrainepha @sandmanработает с 1.11.0. Я удалил nvidia-docker2 и его зависимости, а затем снова переустановил.

  • /etc/apt/preferences.d/nvidia-docker-pin-1002:
Package: *
Pin: origin nvidia.github.io
Pin-Priority: 1002
  • sudo apt remove nvidia-docker2
  • sudo apt autoremoveудалить также libnvidia-container-tools libnvidia-container1 nvidia-container-toolkit nvidia-container-toolkit-base
  • sudo apt install nvidia-docker2
  • sudo systemctl restart docker

Thanks! this is the only one that worked for me

abhinand5 commented 1 year ago

@Winand the fix you mentionned does not work for me either (cli-version==1.11.0 on PopOS 22.04). As a "temporary" (dirty?) workaround passing GPU devices manually works: no-cgroups = true in /etc/nvidia-container-runtime/config.toml and then running with : docker run --rm --gpus all --privileged -v /dev:/dev nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

@mathisc Your quick and dirty fix is the only one that worked for me!

vngabriel commented 1 year ago

@Winand the fix you mentionned does not work for me either (cli-version==1.11.0 on PopOS 22.04). As a "temporary" (dirty?) workaround passing GPU devices manually works: no-cgroups = true in /etc/nvidia-container-runtime/config.toml and then running with : docker run --rm --gpus all --privileged -v /dev:/dev nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

this workaround was the only one that worked for me, is there any official solution about this issue?

viajeradelaluz commented 1 year ago

I'm still stuck on this problem.

docker: Error response from daemon: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.

PopOS: 22.04 Nvidia Driver: 525.60.111 CUDA version: 12.0 Docker version: 20.10.12

Any new idea that may help?