Closed lxzjd closed 3 months ago
@lxzjd , Can you tell us about your environment and show output of the `nvidia-smi', also please share error log. Thank you in advance.
@nvvfedorov, This is my nvidia-smi output:
error log:
My metrics profile default-counters.csv:
I know nvidia P40 certainly does not support the DCP metrics, but I'm very confused with this error, I tried the nvidia H800 don't have the error log, but curl http://localhost:9400/metrics also can't see the DCP related indicators. I just want to know what architectures dcgm's dcp metrics currently support.
@lxzjd, it appears that on the H800 machine, you are using the default configuration for the DCGM-exporter, which does not include DCP metrics. Please update the configuration to use the dcp-metrics-included.csv file instead.
@nvvfedorov , thank you very much, my problem is solved, the H800 does get dcp metrics, I confused the default configuration of the k8s startup with the default configuration file of the docker startup. DCP metrics are supported for Volta, Turing or Ampere GPUs architectures only
that error log is too easy to misunderstand. Can you change it?
Ask your question
My environment is running dcgm-exporter, an error is reported:DCP metrics are supported for Volta, Turing or Ampere GPUs architectures only. Does dcgm currently support these architectures, and does Hopper not?