NVIDIA / dcgm-exporter

NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
Apache License 2.0
923 stars 159 forks source link

https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ is not signed #335

Closed jjziets closed 5 months ago

jjziets commented 5 months ago

What is the version?

3.3.6-3.4.2

What happened?

repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ is not signed so can't add and install datacenter-gpu-manager on ubunut thefore can't build

What did you expect to happen?

after running these sets one should be able to proceed and make the project but one cant as the repo is not signed

distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/.//g') wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb sudo apt-get update sudo apt-get install -y datacenter-gpu-manager

What is the GPU model?

A6000/A5000 A100, all nog gpu related issue

What is the environment?

DCGM-Exporter running on bare metal

How did you deploy the dcgm-exporter and what is the configuration?

build as system service. did not get there

How to reproduce the issue?

sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /" sudo apt-get update

or

distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/.//g') wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb sudo apt-get update sudo apt-get install -y datacenter-gpu-manager

sudo apt-get install -y datacenter-gpu-manager Reading package lists... Done Building dependency tree... Done Reading state information... Done E: Unable to locate package datacenter-gpu-manager

echo $distribution ubuntu2204

Anything else we need to know?

No response

jjziets commented 5 months ago

My workaround is: But is less than ideal. How to report this to nvidia?

Update and install necessary packages

echo "Updating package list and installing necessary packages..." sudo apt update sudo apt install -y git wget lsb-release software-properties-common snapd

Download and install Data Center GPU Manager package

echo "Downloading and installing Data Center GPU Manager..." wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/datacenter-gpu-manager_3.3.6_amd64.deb sudo dpkg -i datacenter-gpu-manager_3.3.6_amd64.deb

Fix missing dependencies

sudo apt-get install -f

nvvfedorov commented 5 months ago

Thank you for the reporting about the issue. Steps on how to configure the CUDA repository for Ubuntu you can find here: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#network-repo-installation-for-ubuntu.

The reported issue doesn't belong to the DCGM-exporter. If you have concerns about datacenter-gpu-manager aka DCGM documentation, please report the issue here: https://github.com/NVIDIA/DCGM/issues