ROCm / rocprofiler

ROC profiler library. Profiling with perf-counters and derived metrics.
https://rocm.docs.amd.com/projects/rocprofiler/en/latest/
Other
115 stars 43 forks source link

TCC_HIT_sum for MI210 is only half of MI100 when L2CacheHit is ~100% #95

Open lingjiew93 opened 1 year ago

lingjiew93 commented 1 year ago

Hi,

I found that TCC_READ_sum is only half about the real size which from MI100 result. The same as TCC_HIT_sum. They both have 64 bytes cacheline size so the number doesn't make sense. Could someone help to check with it?

lingjiew93 commented 1 year ago

To be clear, there is no TCC_READ_sum for MI100. The situation is for L2CacheHit ~100%. TCC_HIT_sum for MI210 is half of MI100.

te42kyfo commented 1 year ago

I observed the same, on MI100, (TCC_HIT_sum + TCC_MISS_sum) * 32 matched the expected L2 cache data volume. On MI210, this expressions results in exactly half of what is expected.

There are more counters on gfx90a. For example, (TCP_TCC_READ_REQ_sum)*32 corresponds to expected load data volume between L1 and L2 cache.