Closed ibogosavljevic closed 10 months ago
I can totally understand that Core <-> L1 traffic is of interest but there are simply no events to derive it. While most caches work on a cache line basis, the L1 cache is commonly byte-addressable. Although there are load/store micro-ops events for most architectures (sometimes speculative), the width of the accesses is unknown.
You can derive the L1 load cache miss rates on many Intel platforms with the MEM_LOAD_RETIRED_L1_HIT
, MEM_LOAD_RETIRED_L1_MISS
and MEM_LOAD_RETIRED_L1_ALL
(Remark: these MEM_LOAD_RETIRED_*
events are mentioned in errata documents of some micro-architectures).
Moreover, the L2
group uses basically an L1 event to measure the loaded cache lines from L2 to L1. It somehow also reflects the load/RFO misses in L1.
So, essentially, there are some counters, but they are not reliable.
Well, yes and no. Depending on the micro-architecture, they are reliable or not. You have to check the errata documents (in the "Specification updates" for Intel).
And there are events but they are not usable to derive the Core <-> L1 data volume.
Any more questions? If not, please close the issue.
Thanks
Is your feature request related to a problem? Please describe. I am measuring performance and I want to measure L1 cache miss rates and L1 data volume, but this are unavailable on all architectures. Why?