NVIDIA / DCGM

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
Apache License 2.0
387 stars 50 forks source link

Previous profiling results are still stored in dcgmGroup.samples.GetAllSinceLastCall #140

Open optyang opened 9 months ago

optyang commented 9 months ago

Hi, I am using the python bindings of DCGM.

When starting a new profiling experiment, I would initiate the profiler by profiling_results = dcgmGroup.samples.GetAllSinceLastCall(None, dcgmFieldGroup).

However, when calling GetAllSinceLastCall(profiling_results, dcgmFieldGroup) for the first time after initializing it with None, the old data from previous profiling experiments are also returned, although profiling_results.EmptyValues() is called after every GetAllSinceLastCall in previous experiments. As a result, calling GetAllSinceLastCall for the first time becomes very slow. In the second call of GetAllSinceLastCall and onwards, only the data between two consecutive calls are returned, as expected.

Could you provide any insights about this behavior and how to overcome the slowness of the first call to GetAllSinceLastCall? Thanks a lot.