Open WYmindsky opened 3 years ago
Hey @WYmindsky I'm experiencing the same behaviour. Did you find out why this occurs?
Hey @WYmindsky I'm experiencing the same behaviour. Did you find out why this occurs?
It's still there
Hi,
Could you provide the logs from the dcgm-exporter itself? It looks like there are two dcgm-exporter instances one aware of k8s environment (were able to connect to pod api) and another one didn't. The container_name, pod_namespace, pod_name labels are gathered from the k8s infra and if there are no such labels - connection to the k8s from the dcgm-exporter failed and that should be reflected in the dcgm-exporter logs.
WBR, Nik
yaml:pod-gpu-exporter-daemonset.yaml docker image:pod-gpu-metrics-exporter:1.0.0-alpha dcgm:dcgm-exporter:1.4.6
Duplicate metrics occured when a job scheduling to this server for long time