Open juliantaylor opened 3 years ago
https://github.com/NVIDIA/gpu-monitoring-tools/blob/master/pkg/pipeline.go#L182 only closes device label inside an if, if the condition is false invalid metrics are produced, e.g.
device
DCGM_FI_DEV_FB_FREE{gpu="0",UUID=uid",device="nvidia0,container="",namespace="",pod=""} 15109
pkg/pipeline.go:{{ $val.Name }}{gpu="{{ $val.GPU }}",{{ $val.UUID }}="{{ $val.GPUUUID }}",device="{{ $val.GPUDevice }}{{if $val.MigProfile}}",GPU_I_PROFILE="{{ $val.MigProfile }}",GPU_I_ID="{{ $val.GPUInstanceID }}{{end}}{{if $val.Hostname }}",Hostname="{{ $val.Hostname }}"{{end}}
This is fixed in master now. We will be making a new RC or an official release soon.
https://github.com/NVIDIA/gpu-monitoring-tools/blob/master/pkg/pipeline.go#L182 only closes
device
label inside an if, if the condition is false invalid metrics are produced, e.g.