Closed NaderAlAwar closed 8 months ago
-n , --normal-unit Specify the normalization unit: (DEFAULT: per_wave) per_wave per_cycle per_second per_kernel
The "per_kernel" might be you want
That definitely helps, thanks! Am I correct in looking at Read BW
under L2 - Fabric Transactions
if I want total bytes read in a kernel?
Hi @NaderAlAwar -- I've written up some more detail here: https://github.com/AMDResearch/omniperf/blob/2.x/src/docs/performance_model.md#l2-fabric-transactions. Currently it's only in markdown form, as the website still needs to be updated.
Essentially: this tells you how much data was read by the L2 from the data-fabric, but does not directly report bytes read from HBM/DRAM. This example goes into much more detail.
Closing due to inactivity. Please re-open if you have outstanding questions.
Is there a metric that shows how many bytes in total were read from global memory? I know NVIDIA's nsight compute has a
dram__bytes_read.sum
metric, which reports the total bytes loaded from global memory. Is there an equivalent in omniperf? I found theRead BW
metric, which reports bytes per wave, but that doesn't seem to report what I want. Should I be looking at a specific metric from rocprof instead?