NVIDIA / go-dcgm

Golang bindings for Nvidia Datacenter GPU Manager (DCGM)
Apache License 2.0
95 stars 27 forks source link

getSupportedMetricGroups function takes uint `grpid` and the value is not used #32

Closed LujieDuan closed 1 year ago

LujieDuan commented 1 year ago

Hey,

We are looking to use getSupportedMetricGroups(grpid uint) to get the supported profiling fields of a GPU group.

On CLI, we can query the supported fields for different GPU groups, for example: dcgmi profile -g 470 -l.

Our expected usage in the code is:

  1. Create a GPU group and get the handle;
  2. Pass the handle to getSupportedMetricGroups and get the supported metric fields groups for that GPU group;

However, this function takes a uint grpid which is not clear to us how to get in the code. Looking at the implementation the uint grpid is also not used. Please advise.

glowkey commented 1 year ago

That appears to be a bug in getSupportedMetricGroups that the dcgm-exporter unfortunately relies on (see this link). We'll look into changing getSupportedMetricGroups to accept a group handle and fix the instance in dcgm-exporter.

glowkey commented 1 year ago

On closer examination the getSupportedMetricGroups() function is supposed to take a gpu ID. The parameter has been renamed to reflect this and the parameter is now used within the function.