NVIDIA / cloud-native-stack

Run cloud native workloads on NVIDIA GPUs
Apache License 2.0
118 stars 47 forks source link

NOTICE: NVIDIA Cloud Native Core fails due to CUDA linux repository GPG key rotation #22

Closed erikbohnhorst closed 5 months ago

erikbohnhorst commented 2 years ago

Issue

Pre-existing Cloud Native Core installation stopped working after a reboot. Note: Installations deployed on 04/29 or later are not impacted.

What happened:

NVIDIA team rotated GPG keys for CUDA linux repositories on 4/28. More information on this can be found here. CUDA repository is included in the NVIDIA driver images deployed through GPU operator and causing failures during apt-get update on Ubuntu 20.04. This happens whenever current running driver containers are restarted or node reboots.

Following error message will be seen from driver Pod (nvidia-driver-ctr container):

Updating` the package cache...
W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease' is no longer signed.

Fix:

Option 1 - Uninstall and re-install

  1. bash setup.sh uninstall
  2. bash setup.sh install

Option 2 - Repair

evberrypi commented 2 years ago

Fix (Debian, Ubuntu, WSL):

  1. Set $distro and $arch variables listed here
  2. Run:
    
    sudo apt-key del 7fa2af80

sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/$distro/$arch/3bf863cc.pub



[Source](https://developer.nvidia.com/blog/updating-the-cuda-linux-gpg-repository-key/)