ROCm / rocprofiler

ROC profiler library. Profiling with perf-counters and derived metrics.
https://rocm.docs.amd.com/projects/rocprofiler/en/latest/
Other
116 stars 44 forks source link

Implement fine grain access to shader counters #46

Closed kikimych closed 10 months ago

kikimych commented 3 years ago

This pr implements fine grain access to shader counters.

In current implementation shader counters are accumulated for all Shader Engines (SE). And indexed access is implemented to memory channels, like “TCC_HIT[]”. So same is implemented for SQ counters, like “SQ_WAVES[]” SQ counters dumped by HW only per SE Also implemented two dimensional access to TCP counters (L1 counters). TCP is a part of conveyer and in current profiler implementation TCP counters are accumulated for SEs and indexed inside SE. So there are 64 CUs organized in 4 SEs but the TCP is indexed as 16 instances which is 16 TCP per SE. So it is indexed like TCP_CYCLES[][SE index] Default behavior holds. If square brackets are missing, then assumed that the counter is accumulated for all available block instances