RRZE-HPC / likwid

Performance monitoring and benchmarking suite
https://hpc.fau.de/research/tools/likwid/
GNU General Public License v3.0
1.63k stars 226 forks source link

How to add a new performance counter like L3CacheOccupancy based on Intel Xeon 4 CPU? #628

Closed chaos-diabolicy closed 1 month ago

chaos-diabolicy commented 1 month ago

Why do you need support for this specific architecture? I am using likwid as a perfmon on a platform that consists of Xeon 4 CPUs, it's a really convenient and precise tool! And I am trying to do some further work based on likwid about monitoring more performance counter like L3CacheOccupancy.

Which architecture model, family and further information? CPU or accelerator? architecture: x86_64, CPU family: 6, Model: 143, Model name: Intel Xeon Gold 6433N

Is the documentation of the hardware counters publicly available? I think the manual of Intel CPU is publicly available, and I've read part of it, but I could not find out how to get L3CacheOccupancy counter according to the doc. For example, in function access_x86_rdpmc_read, I'm able to get MSR_PMC1 counter using instruction "rdpmc" and it's register addr is (MSR_PMC1 - MSR_PMC0), but I got a segfault when I tried to use "rdpmc" and reg addr 0xC8E to get L3CacheOccupancy after I insmod the enable_rdpmc.ko in Likwid. I was wondering that how do you know that "rdpmc" should use (MSR_PMC1 - MSR_PMC0) as reg addr to get MSR_PMC1 counter, and what I am supposed to do to find out the method of getting L3CacheOccupancy counter reg addr that "rdpmc" could access? Thanks a lot, and looking forward to your reply!

Are there already any usable tools (commercial or open-source)? PCM maintained by Intel is another tool, but the pcm tool cannot report accurate counter value.

TomTheBear commented 1 month ago

The Intel Xeon Gold 6433N is a SapphireRapids chip where the L3 is part of the Uncore. At least I cannot find a related in-core event. Uncore counters cannot be accessed with rdpmc, only in-core counters.

Since you mentioned (o)PCM (a great tool), I assume, you got the name L3CacheOccupancy from there. The source is publicly available, so you could check which events it uses to derive the L3CacheOccupancy on SapphireRapids. If PCM supports it, it is doable in LIKWID as well (if it is not already supported), both work under the same restrictions.

Please search for the documentation you mentioned and paste the link with a reference to the relevant section.

For reference (on Intel):

In order to see all OCCUPANCY events and corresponding counters that LIKWID supports on SPR: likwid-perfctr -E OCCUPANCY

chaos-diabolicy commented 1 month ago

Appreciate for your reply! It seems that rdpmc cannot access some other registers indeed as you mentioned, and thanks for your suggestions to make use of other tools, I would like to try it.

TomTheBear commented 1 month ago

It really depends which events are used by PCM to get this L3CacheOccupancy. If, for example, the OFFCORE_RESPONSE events are used, they are counted in the PMC* registers and consequently can be read with rdpmc. For details, one has to check the PCM source code.

chaos-diabolicy commented 1 month ago

Thanks for ur advice, it really helps!