NVIDIA / nvidia-docker

Build and run Docker containers leveraging NVIDIA GPUs
Apache License 2.0
17.17k stars 2.03k forks source link

apt-get can't find nvidia-container-toolkit-base #1731

Closed moonman239 closed 1 year ago

moonman239 commented 1 year ago

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

Also, before reporting a new issue, please make sure that:

1. Issue or feature description

Ubuntu 20.04's apt-get says "E: Unable to locate package nvidia-container-toolkit-base"

2. Steps to reproduce the issue

On Ubuntu 20.04, run these commands:

sudo apt-get update \
sudo apt-get install -y nvidia-container-toolkit-base
elezar commented 1 year ago

@moonman239 have you added the NVIDIA Container Toolkit repositories to your /etc/apt/sources.list.d/ folder as per https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#setting-up-nvidia-container-toolkit?

noymer commented 1 year ago

I have the same problem.

In the setting script,

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

I have found curl results seem to be wrong: ubuntu18.04 ~$(ARCH)~ are strange.

> curl -s -L https://nvidia.github.io/libnvidia-container/ubuntu20.04/libnvidia-container.list
deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
#deb https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /
> curl -s -L https://nvidia.github.io/libnvidia-container/ubuntu18.04/libnvidia-container.list
deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
#deb https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /

Edit $(ARCH) is an appropriate variable in apt.

elezar commented 1 year ago

@noymer those URLs are correct. We use the ubuntu18.04 packages for all newer ubuntu versions.

Could you or provide more complete output when running:

sudo apt-get update
sudo apt list -a nvidia-container-toolkit-base

As well as

sudo apt-get update
sudo apt-get install nvidia-container-toolkit-base
noymer commented 1 year ago

@elezar Sorry, I was totally wrong. Run apt --fix-broken install fixed the problem, and this was not because of nvidia-docker but my environment.

The setting script have perfectly worked.

The apt repository works fine on Ubuntu20.04.

> cat /etc/apt/sources.list.d/nvidia-container-toolkit.list
deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /
elezar commented 1 year ago

Thanks for the confirmation. Closing the issue.

TallPatrick commented 1 year ago

I'm seeing a similar issue when following these instructions. After funning the setting script mentioned above, When I go run "sudo apt update" or "sudo apt-get update" I get the error: E: Invalid value set for option Signed-By regarding source https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64/ / (not a fingerprint) E: The list of sources could not be read.

Any ideas what could be causing this?

elezar commented 1 year ago

I'm seeing a similar issue when following these instructions. After funning the setting script mentioned above, When I go run "sudo apt update" or "sudo apt-get update" I get the error: E: Invalid value set for option Signed-By regarding source https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64/ / (not a fingerprint) E: The list of sources could not be read.

Any ideas what could be causing this?

@TallPatrick what is the ouput of:

grep nvidia.github.io /etc/apt/sources.list.d/*.list

?

TallPatrick commented 1 year ago

@elezar

it gets me 2 lines back: image

elezar commented 1 year ago

@TallPatrick that seems to be as expected. Can you confirm that the gpg key that is being referenced in the lines there was created correctly. Running:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

Should recreate this key.

TallPatrick commented 1 year ago

@elezar No change. image

This is likely something borked on my setup. I also found a work-around for my use case (just installing the nvidia drivers inside the container), but I am still mostly curious what is causing this error...

elezar commented 1 year ago

@TallPatrick installing the drivers inside the container is not recommended. The same driver versions are then required on the host and in the container and one breaks portability.

I've just noticed from the screenshot that the signed-by section contains a relative path and not an absolute path. Could you change usr/share/keyrings/nvidia-container-toolkit-keyring.gpg to /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg?

Note that as of the v1.13.0 release, the packages for the NVIDIA Container Toolkit are also available directly from the CUDA Downloads repositories. Meaning that instructions here (using Ubuntu 20.04 as an example) can be followed to set up the repositories.

Note that it is NOT require to install the CUDA Toolkit on the host and an existing driver installation can be used.

In your case running:

sudo rm /etc/apt/sources.list.d/nvidia-container-toolkit.list
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo apt-get -y install nvidia-container-toolkit

should configure the repository (assuming that it's not already configured) and install the components of the NVIDIA Container Toolkit.

tsilvs commented 11 months ago

Would be better to update the docs at https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html with @elezar's answer.

prismspecs commented 11 months ago

@moonman239 have you added the NVIDIA Container Toolkit repositories to your /etc/apt/sources.list.d/ folder as per https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#setting-up-nvidia-container-toolkit?

for some reason they have written all of the instructions out of order, so it isn't clearly laid out. For others who run into this issue,

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

then update and install

wanghm commented 10 months ago

for some reason they have written all of the instructions out of order, so it isn't clearly laid out. For others who run into this issue Don't know what the reason is. But hope they can improve their documentations. The user guide is very unclear, with various platforms mixed together, and the order of commands is also incorrect.

Gpetrak commented 10 months ago

I'm running Ubuntu 20.04 with NVIDIA driver: 470.82.01 and CUDA 11.4. When I installed the NVIDIA-docker-container from the instructions above, it raises the following error: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.7, please update your driver to a newer version, or use an earlier cuda container: unknown.

Do you know which version of NVIDIA-docker-container is compatible with the driver 470.82.01 ?