NVIDIA / nvidia-docker

Build and run Docker containers leveraging NVIDIA GPUs
Apache License 2.0
17.22k stars 2.03k forks source link

nvidia-container-toolkit is not installing on RHEL 7 #1416

Closed paniabhisek closed 3 years ago

paniabhisek commented 3 years ago

1. Issue or feature description

Docker with gpu support needs nvidia-container-toolkit. When I tried to install, it wouldn't work.

2. Steps to reproduce the issue

nvidia-smi with gpu supported docker throws:

$ sudo docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

So according to this, when I try to install nvidia-container-toolkit.

Loaded plugins: fastestmirror, langpacks, nvidia, product-id, search-disabled-repos, subscription-manager

This system is not registered with an entitlement server. You can use subscription-manager to register.

Repository base is listed more than once in the configuration
Repository updates is listed more than once in the configuration
Repository extras is listed more than once in the configuration
Repository libnvidia-container is listed more than once in the configuration
Repository libnvidia-container-experimental is listed more than once in the configuration
Loading mirror speeds from cached hostfile
 * base: mirror.metrocast.net
 * extras: mirror.es.its.nyu.edu
 * updates: mirror.atlanticmetro.net
base                                                                                                                                             | 3.6 kB  00:00:00
docker-ce-stable                                                                                                                                 | 3.5 kB  00:00:00
extras                                                                                                                                           | 2.9 kB  00:00:00
libnvidia-container/x86_64/signature                                                                                                             |  488 B  00:00:00
libnvidia-container/x86_64/signature                                                                                                             | 2.1 kB  00:00:00 !!!
nvidia-container-runtime/x86_64/signature                                                                                                        |  488 B  00:00:00
Retrieving key from https://nvidia.github.io/nvidia-container-runtime/gpgkey
nvidia-container-runtime/x86_64/signature                                                                                                        | 2.1 kB  00:00:00 !!!
https://nvidia.github.io/nvidia-container-runtime/stable/centos7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for nvidia-container-runtime
Trying other mirror.

 One of the configured repositories failed (nvidia-container-runtime),
 and yum doesn't have enough cached data to continue. At this point the only
 safe thing yum can do is fail. There are a few ways to work "fix" this:

     1. Contact the upstream for the repository and get them to fix the problem.

     2. Reconfigure the baseurl/etc. for the repository, to point to a working
        upstream. This is most often useful if you are using a newer
        distribution release than is supported by the repository (and the
        packages for the previous distribution release still work).

     3. Run the command with the repository temporarily disabled
            yum --disablerepo=nvidia-container-runtime ...

     4. Disable the repository permanently, so yum won't use it by default. Yum
        will then just ignore the repository until you permanently enable it
        again or use --enablerepo for temporary usage:

            yum-config-manager --disable nvidia-container-runtime
        or
            subscription-manager repos --disable=nvidia-container-runtime

     5. Configure the failing repository to be skipped, if it is unavailable.
        Note that yum will try to contact the repo. when it runs most commands,
        so will have to try and fail each time (and thus. yum will be be much
        slower). If it is a very temporary problem though, this is often a nice
        compromise:

            yum-config-manager --save --setopt=nvidia-container-runtime.skip_if_unavailable=true

failure: repodata/repomd.xml from nvidia-container-runtime: [Errno 256] No more mirrors to try.
https://nvidia.github.io/nvidia-container-runtime/stable/centos7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for nvidia-container-runtime

3. Information to attach (optional if deemed irrelevant)

klueska commented 3 years ago

Can you try resetting the GPU key for the nvidia-container-runtime repo?

See the instructions near the bottom of: https://nvidia.github.io/nvidia-container-runtime/

paniabhisek commented 3 years ago

I was using the same link, but this time it did install nvidia-container-toolkit (version: 1.3.0-2).

And following this, the container is able to run nvidia-smi

I might have missed something the other day. Thanks you 👍