NVIDIA / DCGM

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
Apache License 2.0
373 stars 49 forks source link

Why there is no corresponding .cpp file for dcgm_agent.h? #99

Open irvingans opened 1 year ago

irvingans commented 1 year ago

https://github.com/NVIDIA/DCGM/blob/7e1012302679e4bb7496483b32dcffb56e528c92/dcgmlib/dcgm_agent.h#L1228

Hi, I am looking for the specific implemetation method for dcgmGetPidInfo(), but I can not find the corresponding .cpp file for dcgm_agent.h. Why is this?

nikkon-dev commented 1 year ago

The dcgm_agent.h is the public API header and is not directly related to the implementation part. Take a look here and look for the tsapiEngineGetPidInfo in the DcgmApi.cpp file.

irvingans commented 1 year ago

Hi @nikkon-dev , in the DcgmApi.cpp file, tsapiEngineGetPidInfo(dcgmHandle_t pDcgmHandle, dcgmGpuGrp_t groupId, dcgmPidInfo_t *pidInfo) , pidInfo is what users pass in instead of being obtained from some other sources, right?

nikkon-dev commented 1 year ago

Yes, that's what the user passes to the function setting two fields: version (dcgmPidInfo_version2 for now) and pid, which is the PID of a desired process created after the stats gathering was enabled.

optyang commented 10 months ago

Yes, that's what the user passes to the function setting two fields: version (dcgmPidInfoversion2 for now) and pid, which is the PID of a desired process created after_ the stats gathering was enabled.

Hi @nikkon-dev , how to enable stats gathering in python bindings? Thanks!