NVIDIA / DCGM

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
Apache License 2.0
393 stars 50 forks source link

Getting Utilization metrics #182

Open apaz-cli opened 2 months ago

apaz-cli commented 2 months ago

I'm writing a performance monitor for my desktop. Basically, it just queries nvidia-smi every second.

nvidia-smi reports utilization as a percentage of the number of SMs that have been used within the last second. This is great, but not what you would typically think "utilization" means. I'd like to measure busy time across SMs as a percentage of total time.

What is the correct function call to do this? I read the docs, but unfortunately they've just managed to confuse me further. Is the functionality Hopper-only?