NVIDIA / gpu-monitoring-tools

Tools for monitoring NVIDIA GPUs on Linux
Apache License 2.0
1.02k stars 301 forks source link

invalid metrics in 2.4.0rc2 #187

Open juliantaylor opened 3 years ago

juliantaylor commented 3 years ago

https://github.com/NVIDIA/gpu-monitoring-tools/blob/master/pkg/pipeline.go#L182 only closes device label inside an if, if the condition is false invalid metrics are produced, e.g.

DCGM_FI_DEV_FB_FREE{gpu="0",UUID=uid",device="nvidia0,container="",namespace="",pod=""} 15109
pkg/pipeline.go:{{ $val.Name }}{gpu="{{ $val.GPU }}",{{ $val.UUID }}="{{ $val.GPUUUID }}",device="{{ $val.GPUDevice }}{{if $val.MigProfile}}",GPU_I_PROFILE="{{ $val.MigProfile }}",GPU_I_ID="{{ $val.GPUInstanceID }}{{end}}{{if $val.Hostname }}",Hostname="{{ $val.Hostname }}"{{end}}
dbeer commented 3 years ago

This is fixed in master now. We will be making a new RC or an official release soon.