ROCm / rocprofiler

ROC profiler library. Profiling with perf-counters and derived metrics.
https://rocm.docs.amd.com/projects/rocprofiler/en/latest/
Other
115 stars 43 forks source link

A consistent terminology for describing the basic performance counters (e.g. TCC_HIT) in the ROCm documentation #116

Open vladaindjic opened 1 year ago

vladaindjic commented 1 year ago

Hi all,

While using the rocprofv2 to collect performance counters on MI200 , I noticed some inconsistencies in the official documentation. According to it, MI200 architecture supports the tcc_hit (lowercase), that is however not present in the rocprofv2 --list-counters output. Instead, the latter command offers two kinds of counters instead: TCC_HIT (uppercase) and TCC_HIT_sum. The ${ROCM_PATH}/lib/rocprofiler/metrics.xml also uses upper-case syntax. Personally, I think some consistency would help the new users that plan to use the counters collection feature of rocprof and/or rocprof API. At least, the documentation should mention that the right list of names is presented in the metric.xml file.

Furthermore, the usage of TCC_HIT basic counter is a bit unclear. When trying to use TCC_HIT, the rocprofv2 shows an error that the counter is unsupported. After some time of analysing the rocprofv2 --list-counter and trying different options, I found that this counter requires a TCC instance number provided in the brackets immediately after the counter name (e.g. TCC_HIT[23]). Perhaps, I might have missed something when reading the documentation, but I think it would be good to provide a few examples of how to use this type of basic counters (e.g. TCC_HIT, TCC_MISS, etc.)

Best regards, Vladimir