NordicHPC / sonar

Tool to profile usage of HPC resources by regularly probing processes.
GNU General Public License v3.0
8 stars 5 forks source link

Other GPU utilization measures #181

Open lars-t-hansen opened 3 months ago

lars-t-hansen commented 3 months ago

It's possible to have 100% gpu utilization as measured by nvidia-smi and still not doing anything because the available parallelism is not exploited; keeping a single SMI busy is enough for 100%. There is some discussion of that here: https://news.ycombinator.com/item?id=41312335. It might be interesting to investigate whether there are measurements we could extract to highlight this.

(Obviously this is not unique to GPUs but it's a lot more critical in GPUs given the available parallelism. In CPUs a somewhat-but-not-really comparable situation is when the program is using serial code to process data and letting the AVX512 unit sit unused - we're not able to see this.)