NVIDIA / dcgm-exporter

NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
Apache License 2.0
909 stars 157 forks source link

can exporter the uce error? #401

Open zhucan opened 2 weeks ago

zhucan commented 2 weeks ago

What is the version?

3.3.5

What happened?

Image

What did you expect to happen?

dcgm-exporter can get the uce error?

What is the GPU model?

No response

What is the environment?

No response

How did you deploy the dcgm-exporter and what is the configuration?

No response

How to reproduce the issue?

No response

Anything else we need to know?

No response

glowkey commented 2 weeks ago

DCGM-Exporter can monitor most DCGM fields from this page, including many ECC errors:

https://docs.nvidia.com/datacenter/dcgm/latest/dcgm-api/dcgm-api-field-ids.html#c.DCGM_FI_DEV_ECC_CURRENT

Please see the docs for ways to customize which fields are monitored.