NVIDIA / dcgm-exporter

NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
Apache License 2.0
929 stars 159 forks source link

Why SYS_ADMIN is required? #407

Open Yvanll opened 1 month ago

Yvanll commented 1 month ago

Recently, I found dcgm-exporter will enconter a runtime error: FATA[0000] Failed to watch metrics: Error watching fields: Host engine is running as non-root. And adding --cap-add=SYS_ADMIN with docker run or SYS_ADMIN in yaml can step over the error. I wonder what specific operation causes the issue?

The error was found in below scenarios with dcgm-exporter 3.6.0 without SYS_ADMIN Device Driver Version CUDA Version
NVIDIA A30 535.54.03 12.2
NVIDIA L2 535.154.05 12.2
Yvanll commented 1 month ago

Related issue is 402