RRZE-HPC / likwid

Performance monitoring and benchmarking suite
https://hpc.fau.de/research/tools/likwid/
GNU General Public License v3.0
1.65k stars 226 forks source link

[FeatureRequest] Why no architectur has L1 or L1CACHE performance groups #538

Closed ibogosavljevic closed 10 months ago

ibogosavljevic commented 1 year ago

Is your feature request related to a problem? Please describe. I am measuring performance and I want to measure L1 cache miss rates and L1 data volume, but this are unavailable on all architectures. Why?

TomTheBear commented 1 year ago

I can totally understand that Core <-> L1 traffic is of interest but there are simply no events to derive it. While most caches work on a cache line basis, the L1 cache is commonly byte-addressable. Although there are load/store micro-ops events for most architectures (sometimes speculative), the width of the accesses is unknown.

You can derive the L1 load cache miss rates on many Intel platforms with the MEM_LOAD_RETIRED_L1_HIT, MEM_LOAD_RETIRED_L1_MISS and MEM_LOAD_RETIRED_L1_ALL (Remark: these MEM_LOAD_RETIRED_* events are mentioned in errata documents of some micro-architectures).

Moreover, the L2 group uses basically an L1 event to measure the loaded cache lines from L2 to L1. It somehow also reflects the load/RFO misses in L1.

ibogosavljevic commented 1 year ago

So, essentially, there are some counters, but they are not reliable.

TomTheBear commented 1 year ago

Well, yes and no. Depending on the micro-architecture, they are reliable or not. You have to check the errata documents (in the "Specification updates" for Intel).

And there are events but they are not usable to derive the Core <-> L1 data volume.

TomTheBear commented 10 months ago

Any more questions? If not, please close the issue.

ibogosavljevic commented 10 months ago

Thanks