NVIDIA / libnvidia-container

NVIDIA container runtime library
Apache License 2.0
816 stars 200 forks source link

[Error] Apt repos for ubuntu 20.04 and 22.04 are broken #202

Closed darintay closed 7 months ago

darintay commented 1 year ago

The apt repos for libnvidia-container appear to have broken in the last day or two.

In apt.list as:

deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/libnvidia-container/stable/ubuntu20.04/$(ARCH) /
deb https://nvidia.github.io/libnvidia-container/stable/ubuntu22.04/$(ARCH) /

(I've added 18.04 20.04 and 22.04 all to my list just for demonstration)

Error for apt update looks like:

Get:1 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  InRelease [1484 B]
Ign:2 https://nvidia.github.io/libnvidia-container/stable/ubuntu20.04/amd64  InRelease                                   
Ign:3 https://nvidia.github.io/libnvidia-container/stable/ubuntu22.04/amd64  InRelease  
Get:4 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  InRelease [1481 B]
Get:5 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  InRelease [1474 B]      
Err:6 https://nvidia.github.io/libnvidia-container/stable/ubuntu20.04/amd64  Release                                                                
  404  Not Found [IP: 185.199.109.153 443]
Err:7 https://nvidia.github.io/libnvidia-container/stable/ubuntu22.04/amd64  Release                                                                                 
  404  Not Found [IP: 185.199.109.153 443]

...
E: The repository 'https://nvidia.github.io/libnvidia-container/stable/ubuntu20.04/amd64  Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: The repository 'https://nvidia.github.io/libnvidia-container/stable/ubuntu22.04/amd64  Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

With curl:

$ curl https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64/InRelease | head
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1484  100  1484    0     0  54962      0 --:--:-- --:--:-- --:--:-- 54962
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Architectures: amd64
Codename: bionic
Components: main
Date: Fri, 27 Apr 2018 21:29:25 +0000
Description: NVIDIA container runtime library repository
Label: NVIDIA CORPORATION <cudatools@nvidia.com>
Origin: https://nvidia.github.io/libnvidia-container

$ curl https://nvidia.github.io/libnvidia-container/stable/ubuntu20.04/amd64/InRelease | head
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    81  100    81    0     0   3375      0 --:--:-- --:--:-- --:--:--  3375
# Unsupported distribution!
# Check https://nvidia.github.io/libnvidia-container

$ curl https://nvidia.github.io/libnvidia-container/stable/ubuntu22.04/amd64/InRelease | head
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    81  100    81    0     0   3240      0 --:--:-- --:--:-- --:--:--  3240
# Unsupported distribution!
# Check https://nvidia.github.io/libnvidia-container
didyouexpectthat commented 1 year ago

I am getting this with debian11, too. :(

dev-onejun commented 1 year ago

I found the reasons from d63561e for ubuntu and 93f5638 for Debian.

We should question to @elezar why the files are changed.

Is there a problem if the stable/ubuntu22.04 or stable/ubuntu20.04 were symlinks to stable/ubuntu18.04?

JHSPerc commented 1 year ago

I am experiencing this issue as well. Any ETA on a fix?

dev-onejun commented 1 year ago

You could fix temporarily, if you change the file nvidia-container-toolkit.list at /etc/apt/sources.list.d

to

deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /
elezar commented 1 year ago

@dev-onejun the files were changed to optimize our repository for serving via GitHub pages. We are at the limit of what we can serve there and this causes issues when we add new versions of the packages.

At the backend the ubuntu20.04 and ubuntu22.04 folders (repositories) are symlinks to ubuntu18.04 (the same is true for debian11 and debian10, respectively). Unfortunately GitHub pages does not implement this as a redirect, and instead copies the folders entirely leading to a larger GitHub pages deployment and often triggering a timeout when adding new packages to the repo.

The changes in

I found the reasons from https://github.com/NVIDIA/libnvidia-container/commit/d63561e66c6ee45ee8831c098e716e1620c851d8 for ubuntu and https://github.com/NVIDIA/libnvidia-container/commit/93f5638c6bbc1b9266287dc33a4af157ba6ea4b3 for Debian

create a symlink / copy to the .list file directly instead of to the root of the source distributions (ubuntu18.04, debian10) so as to reduce the size of the final pages archive being published. This obviously breaks use cases that refer to the repository directly through the URL.

Note that the mechanism for adding the repositories in our official documentation will result in a .list that contains the ubuntu18.04 (or debian10) URLs and should continue to work as expected.

I will revert the changes so as to fix the broken downstream dependencies, but it would be recommended to use the base repo lists instead in cases where the repository lists are manually constructed.

saltydk commented 1 year ago

Symlink the folders to avoid the relative lookup? https://github.com/s4y/gh-pages-symlink-test if you want an example repo.

dev-onejun commented 1 year ago

You could fix temporarily, if you change the file nvidia-container-toolkit.list at /etc/apt/sources.list.d

to

deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /

Oh, thank you for your kind respond. Then, this solution would be recommended for ubuntu and

deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/debian10/$(ARCH) /

for debian.

elezar commented 1 year ago

Symlink the folders to avoid the relative lookup? https://github.com/s4y/gh-pages-symlink-test if you want an example repo.

@saltydk we use symlinks to folders extensively. The issue is not that they don't work but that when constructing the tar archive for the GitHub pages deployments the contents of the folders are duplicated.

saltydk commented 1 year ago

Ah, missed that bit of your post.

elezar commented 1 year ago

You could fix temporarily, if you change the file nvidia-container-toolkit.list at /etc/apt/sources.list.d to

deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /

Oh, thank you for your kind respond. Then, this solution would be recommended for ubuntu and

deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/debian10/$(ARCH) /

for debian.

@dev-onejun yes, if possible the "base" repos should be used. The most important of these are: ubuntu18.04, debian10, and centos8.

I will revert the breaking changes nonetheless so as to address https://github.com/kubernetes/kops/issues/15094 where a modification of the repo lists is not as straightforward.

saltydk commented 1 year ago

Reasoning seems to be explained here https://github.com/community/community/discussions/9104 maybe it is time to break it up into smaller sites.

saltydk commented 1 year ago

https://github.com/jekyll/jekyll-redirect-from maybe?

elezar commented 1 year ago

The changes have been reverted and the ubuntu* and debian* repositories should function as before. I will evaluate the KOPS issue (https://github.com/kubernetes/kops/issues/15094) and once we have a workaround or a better understanding of what is going on there, may reapply the changes.

@saltydk we are in the process of moving our packages to the CUDA Downloads repositories. However, this or splitting the site will result in similar issues to this one where our understanding of the structure of the repositories and their use doed not align with what is used in practice.

Also, thanks for the link. I will have a look at it.

dev-onejun commented 1 year ago

IMG_0677

Reverting the commits seem working well. 👍

elezar commented 1 year ago

NOTE: To avoid issues such as this going forward, please use

deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /

and

deb https://nvidia.github.io/libnvidia-container/stable/debian10/$(ARCH) /

repositories instead of one of the derrived ones.

JHSPerc commented 1 year ago

Good to know - thanks, everyone!

cosmicvarion commented 1 year ago

Visiting the URLs:

still results in a page showing the message # Unsupported distribution! # Check https://nvidia.github.io/libnvidia-container.

Is there some workaround in the above thread I'm missing?

elezar commented 1 year ago

The arch-specific folders do not contain index files and as such the custom 404 message above is shown.

Also please use the 18.04 repos on all Ubuntu variants.

elezar commented 7 months ago

We have recently simplified our packaging meaning that a single URL can be used regardless of the ditrubtion.

Please see the instructions here https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-apt and reopen if there are still problems.