ROCm / rocprofiler-compute

Advanced Profiling and Analytics for AMD Hardware
https://rocm.docs.amd.com/projects/omniperf/en/latest/
MIT License
135 stars 49 forks source link

Update L1 bandwidth metric calculations #36

Closed skyreflectedinmirrors closed 1 year ago

skyreflectedinmirrors commented 1 year ago

For the L1 bandwidth calcs in the SoL section, we should be using:

64 * tcp_total_cache_accesses_sum / Duration

This will need to be changed in both the yaml configs (for CLI / standalone), e.g.:

https://github.com/AMDResearch/omniperf/blob/main/src/omniperf_cli/configs/gfx90a/1600_L1_cache.yaml#L28

and the grafana dashboard (cacheBW_pct in the vL1 data section)

coleramos425 commented 1 year ago

Updated metrics in Grafana and config files.

64 * tcp_total_cache_accesses_sum / Duration yields a GB/s format. I tweaked the equation slightly to reflect Pct-of-Peak measure i.e.

((100 * AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / (EndNs - BeginNs)))) / ((($sclk / 1000) * 64) * $numCU))

skyreflectedinmirrors commented 1 year ago

Any particular reason why we'd want it as a PoP? If you look at e.g., the L2<->EA bandwidths, they're reported as raw BWs:

AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
              * 64)) / (EndNs - BeginNs)))

IMO, the PoP is more consistent in the top-level speed-of-light chart (which I just checked, actually used the new definition :P)

But, I wouldn't worry about it too much at the moment, we're going to touch on consistency in our meeting later this week

coleramos425 commented 1 year ago

We've decided to leave the Pct-of-Peek metric in our SOL section (since it's a more contextualized measurement) and add a raw BW metric to the L1D Cache Accesses section.

Closing issue.