NVIDIA / DCGM

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
Apache License 2.0
393 stars 50 forks source link

the issue of watching and querying metrics #186

Open BetaZYN opened 1 month ago

BetaZYN commented 1 month ago

I reviewed the code for the dcgmi tool and found that before querying metrics using the dcgmGetLatestValues_v function in the dmon feature, it first calls dcgmWatchFields and dcgmUpdateAllFields to start monitoring and force a refresh.

However, I noticed that it does not call UnwatchFields after the query is completed. When should UnwatchFields be called?

In my use case, I need to periodically query certain metrics. If I call dcgmWatchFields each time before querying, will there be any additional performance or memory overhead? Or, how should I handle this scenario?

I look forward to your reply