ROCm / rocprofiler

ROC profiler library. Profiling with perf-counters and derived metrics.
https://rocm.docs.amd.com/projects/rocprofiler/en/latest/
MIT License
126 stars 46 forks source link

Explain of expression of metric L2CacheHit #48

Closed hgtsoi closed 1 month ago

hgtsoi commented 3 years ago

In lib/metrics.xml, derived metric L2CacheHit has the following definition ,

# L2CacheHit      The percentage of fetch, write, atomic, and other instructions that hit the data in L2 cache. Value range: 0% (no hit) to 100% (optimal).
  <metric
    name="L2CacheHit"
    descr="The percentage of fetch, write, atomic, and other instructions that hit the data in L2 cache. Value range: 0% (no hit) to 100% (optimal)."
    expr=100*sum(TCC_HIT,16)/(sum(TCC_HIT,16)+sum(TCC_MISS,16))
  ></metric>

Would anyone help to tell the meaning of magic number "16" in the expr attribute above? I am wondering if profiler only collected metrics data for single SE? If so, could we specify which one of SE_NUM to be collected?

kikimych commented 2 years ago

L2 cache is not related to any compute engine. It's located on memory controller. Expression sum(TCC_HIT,16) is equal to to TCC_HIT[0] + TCC_HIT[1] + TCC_HIT[2] + ... + TCC_HIT[15]

harkgill-amd commented 1 month ago

Apologies for the lack of response. As @kikimych mentioned, sum(TCC_HIT,16) is the number of cache hits summed over all the 16 TCC instances. L2CacheHit is simply taking the total number of hits and dividing it by the sum of hits and misses to arrive at a percentage for hit rate.

I will close out this issue as there hasn't been any response to the previous answer. If you feel that this explanation did not fully address your question, feel free to comment and I will re-open the issue. Thanks!