Currently, on MI300, GRBM counters are being summed over XCDs
This results in many utilizations, etc., which rely on these being clock timers being incorrectly calculated (because you have ~ # XCDs * clocks)
We correct this issue by:
implementing a new function that maps the compute partition + arch to an XCD number based on the whitepaper
defining GRBM_GUI_ACTIVE_PER_XCD = average of GRBM_GUI_ACTIVE over all XCDs (and similar variables for other related counters)
For consistency, we use this in all the yaml files regardless of the arch, but on any non-MI300 arch, we should have only 1XCD, i.e., they're identical
When rocprof corrects this behavior, we can simply omit the division.
Solves https://github.com/AMDResearch/omniperf/issues/248
Currently, on MI300, GRBM counters are being summed over XCDs
We correct this issue by:
For consistency, we use this in all the yaml files regardless of the arch, but on any non-MI300 arch, we should have only 1XCD, i.e., they're identical
When rocprof corrects this behavior, we can simply omit the division.