Mellanox / network-operator

Mellanox Network Operator
Apache License 2.0
213 stars 54 forks source link

Error pulling v24.7.0 container image using helm chart (Authorization required?) #1052

Closed drikster80 closed 2 months ago

drikster80 commented 2 months ago

Environment:

Attempting to deploy Network Operator v24.7.0 with command:

helm show values nvidia/network-operator --version v24.7.0 > values.yaml

# Updated to disable nfd since it was deployed with GPU Operator (`nfd.enabled=false`)

helm install network-operator nvidia/network-operator -n nvidia-network-operator --create-namespace --version v24.7.0 -f ./values.yaml --wait

What happened:

ErrImagePull

Error showing in containerd:

Sep 01 18:01:24 gh200-1 containerd[1768487]: time="2024-09-01T18:01:24.687567471Z" level=info msg="PullImage \"nvcr.io/nvstaging/mellanox/network-operator:v24.7.0\""
Sep 01 18:01:25 gh200-1 containerd[1768487]: time="2024-09-01T18:01:25.333471500Z" level=error msg="PullImage \"nvcr.io/nvstaging/mellanox/network-operator:v24.7.0\" failed" error="failed to pull and unpack image \"nvcr.io/nvstaging/mellanox/network-operator:v24.7.0\": failed to resolve reference \"nvcr.io/nvstaging/mellanox/network-operator:v24.7.0\": unexpected status from HEAD request to https://nvcr.io/v2/nvstaging/mellanox/network-operator/manifests/v24.7.0: 401 Unauthorized"
Sep 01 18:01:25 gh200-1 containerd[1768487]: time="2024-09-01T18:01:25.333522412Z" level=info msg="stop pulling image nvcr.io/nvstaging/mellanox/network-operator:v24.7.0: active requests=0, bytes read=1068"

What you expected to happen:

The images should be pulled without credentials.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Should it be pulling from 'nvstaging'? (nvcr.io/nvstaging/mellanox/network-operator:v24.7.0)

Here is the result when attempting to pull with docker (on both nvstaging and nvidia):

docker pull nvcr.io/nvstaging/mellanox/network-operator:v24.7.0
Error response from daemon: unauthorized: <html>
<head><title>401 Authorization Required</title></head>
<body>
<center><h1>401 Authorization Required</h1></center>
<hr><center>nginx/1.22.1</center>
</body>
</html>
docker pull nvcr.io/nvidia/cloud-native/network-operator:v24.7.0
v24.7.0: Pulling from nvidia/cloud-native/network-operator
aa04440524ba: Pull complete
b8e78a3d48f5: Pull complete
836c10435595: Pull complete
4eb1e4c628d6: Pull complete
fc24d032acfc: Pull complete
f4bac45a32a6: Pull complete
ec8d59a67135: Pull complete
Digest: sha256:b8a07696d05fb8e5292991ba731758579313d6983ed3d75dd95e7356024d987c
Status: Downloaded newer image for nvcr.io/nvidia/cloud-native/network-operator:v24.7.0
nvcr.io/nvidia/cloud-native/network-operator:v24.7.0

Logs:

Environment:

drikster80 commented 2 months ago

Locally updating the following line in values.yaml fixed the issue...

repository: nvcr.io/nvidia/cloud-native
rollandf commented 2 months ago

Thanks for reporting. Working on a fix.

rollandf commented 2 months ago

Fixed now also in public repo.

helm fetch https://helm.ngc.nvidia.com/nvidia/charts/network-operator-24.7.0.tgz