NVIDIA / dcgm-exporter

NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
Apache License 2.0
923 stars 159 forks source link

Can't collecting DCP metrics #365

Open jeffreyyjp opened 3 months ago

jeffreyyjp commented 3 months ago

What is the version?

3.3.6-3.4.2

What happened?

I deploy dcgm-expoter in our k8s cluster, and can't get DCP metrics. I checked one of pod log, it shows: image

What did you expect to happen?

Can use DCP metrics

What is the GPU model?

V100, 3090 image image

What is the environment?

k8s pod

How did you deploy the dcgm-exporter and what is the configuration?

Use helm

How to reproduce the issue?

Just use helm to deploy

Anything else we need to know?

No response

nvvfedorov commented 3 months ago

@jeffreyyjp, To use DCP metrics, you need a Volta GPU or later.

jeffreyyjp commented 3 months ago

@nvvfedorov Does V100 or 3090 are Volta GPU or later?

nvvfedorov commented 3 months ago

@jeffreyyjp , The Tesla V100 should support DCP metrics, whereas the RTX3090 does not, as it is a consumer-grade GPU. This is because DCP metrics are available on Data Center-grade GPUs.

jeffreyyjp commented 3 months ago

@nvvfedorov So my V100 node can't get DCP metrics,I don't know the reason.