aws / deep-learning-containers

AWS Deep Learning Containers (DLCs) are a set of Docker images for training and serving models in TensorFlow, TensorFlow 2, PyTorch, and MXNet.
https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html
Other
992 stars 453 forks source link

[bug] apt update errors due to failing NVIDIA certificate verification #1848

Open austinmw opened 2 years ago

austinmw commented 2 years ago

Checklist

Concise Description: Unable to apt update SageMaker DLC's due to failing NVIDIA certificate verification

To reproduce: nvidia-docker run -it --rm 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.10.2-gpu-py38-cu113-ubuntu20.04-sagemaker "apt update"

DLC image/dockerfile: Multiple, for example: 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.10.2-gpu-py38-cu113-ubuntu20.04-sagemaker

Current behavior:

root@a7301cb95566:/# apt update
Ign:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease
Ign:2 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64  InRelease                                                   
Err:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  Release                                                                 
  Certificate verification failed: The certificate is NOT trusted. The certificate issuer is unknown.  Could not handshake: Error in the certificate verification. [IP: 152.195.19.142 443]
Err:4 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64  Release                                                     
  Certificate verification failed: The certificate is NOT trusted. The certificate issuer is unknown.  Could not handshake: Error in the certificate verification. [IP: 152.195.19.142 443]
Get:5 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]                                                                                           
Get:6 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]                                                     
Get:7 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu focal InRelease [23.8 kB]             
Get:8 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]                              
Get:9 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB]                                   
Get:11 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu focal/main amd64 Packages [16.5 kB]
Get:12 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB]
Get:13 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB]
Get:14 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB]           
Get:15 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [1139 kB]
Get:16 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [1216 kB]    
Get:17 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [30.3 kB]    
Get:18 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [2188 kB]
Get:19 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1154 kB]
Get:20 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [25.8 kB]
Get:21 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [1773 kB]         
Get:22 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [26.0 kB]   
Get:23 http://archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [51.2 kB]          
Get:24 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [870 kB]    
Reading package lists... Done                              
W: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/InRelease: No system certificates available. Try installing ca-certificates.
W: https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/InRelease: No system certificates available. Try installing ca-certificates.
W: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/Release: No system certificates available. Try installing ca-certificates.
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
W: https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/Release: No system certificates available. Try installing ca-certificates.
E: The repository 'https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64  Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

Expected behavior: Completing successfully

Additional context: This largely prevents using these images as a base to build on top of

amritap-ef commented 2 years ago

Having the same issue

tejaschumbalkar commented 2 years ago

Thank you for reporting the issue! Please let us know if you are still facing the issue.

stefan-matcovici commented 1 year ago

Having the same issue with images with torch<=1.9, any updates on how to mitigate?

public-git-ui commented 1 year ago

You can give this a try in the docker file before apt-get update

# Workaround for CUDA Linux Repository Key Rotation
# https://forums.developer.nvidia.com/t/notice-cuda-linux-repository-key-rotation/212772
RUN rm /etc/apt/sources.list.d/cuda.list
RUN rm /etc/apt/sources.list.d/nvidia-ml.list
RUN apt-key del 7fa2af80
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub