Open malixian opened 3 years ago
Hi @malixian , per-SM profiling metrics are not possible with DCGM.
Hi @malixian , per-SM profiling metrics are not possible with DCGM.
Hi, could you plz explain what is the meaning of sm(%), the output when I use 'nvidia-smi pmon'?Is it the average value of all SMs occupancy?
Yes. It's the average across all SMs. There are 3 dimensions of untilizations:
gr_activity (1001) - Is any kernel running on any SM. Using 1 block with 1 thread = 100%. sm_activity (1002) - Is any kernel running on the SMs. Using numSMs blocks with 1 thread = 100%. Averaged across SMs sm_occupancy (1003) - How many warps ran vs theoretical max numSMs blocks with 64 threads = 100%. Averaged across SMs
Yes. It's the average across all SMs. There are 3 dimensions of untilizations:
gr_activity (1001) - Is any kernel running on any SM. Using 1 block with 1 thread = 100%. sm_activity (1002) - Is any kernel running on the SMs. Using numSMs blocks with 1 thread = 100%. Averaged across SMs sm_occupancy (1003) - How many warps ran vs theoretical max numSMs blocks with 64 threads = 100%. Averaged across SMs
Many thanks for your reply. However, when I use different batch sizes to train DL jobs such as VGG16, the sm(%) decreases when batch size increases. (when i set batch size = 2, the sm(%) is up to 97%; but when i set batch size = 4096, the sm(%) is about 60%) Do you have any ideas on this?
Sorry. I'm not familiar with tuning DL jobs. I can help with GPU monitoring questions though
Like SM0:90% SM1:90% .... SM16:0%, SM17:0% ....