ROCm / rocprofiler-compute

Advanced Profiling and Analytics for AMD Hardware
https://rocm.docs.amd.com/projects/omniperf/en/latest/
MIT License
135 stars 49 forks source link

Total number of bytes read from global memory metric #253

Closed NaderAlAwar closed 8 months ago

NaderAlAwar commented 8 months ago

Is there a metric that shows how many bytes in total were read from global memory? I know NVIDIA's nsight compute has a dram__bytes_read.sum metric, which reports the total bytes loaded from global memory. Is there an equivalent in omniperf? I found the Read BW metric, which reports bytes per wave, but that doesn't seem to report what I want. Should I be looking at a specific metric from rocprof instead?

feizheng10 commented 8 months ago

-n , --normal-unit Specify the normalization unit: (DEFAULT: per_wave) per_wave per_cycle per_second per_kernel

The "per_kernel" might be you want

NaderAlAwar commented 8 months ago

That definitely helps, thanks! Am I correct in looking at Read BW under L2 - Fabric Transactions if I want total bytes read in a kernel?

skyreflectedinmirrors commented 8 months ago

Hi @NaderAlAwar -- I've written up some more detail here: https://github.com/AMDResearch/omniperf/blob/2.x/src/docs/performance_model.md#l2-fabric-transactions. Currently it's only in markdown form, as the website still needs to be updated.

Essentially: this tells you how much data was read by the L2 from the data-fabric, but does not directly report bytes read from HBM/DRAM. This example goes into much more detail.

coleramos425 commented 8 months ago

Closing due to inactivity. Please re-open if you have outstanding questions.