RRZE-HPC / likwid

Performance monitoring and benchmarking suite
https://hpc.fau.de/research/tools/likwid/
GNU General Public License v3.0
1.65k stars 226 forks source link

How to measure the interaction between DDIO and LLC slice #550

Closed cxxuser closed 8 months ago

cxxuser commented 1 year ago

In its capacity as a caching agent the CHA manages the interface between the core the IIO devices and the last level cache (LLC).

Intel DDIO will directly inject network data into LLC. LLC is divided into multiple LLC slices, and each LLC slice is equipped with a CHA. As stated in your documentation, CHA can manage the interface between the IIO devices and the last level cache. I wonder what events can count DDIO writes to LLC? Can LLC_LOOKUP do this? Or is there any other counters or events that can do this?

TomTheBear commented 1 year ago

Since this question is very specific to a technology from Intel, they are probably the better recipients for your questions.

I would look into the CHA/CBOX TOR_INSERTS* events (TOR=Table of Requests). There is a differentiation between iA (from the HWThreads) and IO (from IO devices). If you want more detailed counts, you probably have to match with the CHA opcode. Depending on the architecture, opcode matching might not work with LIKWID. It is rarely used by anybody.

The whole official event list nor the metric list for Intel Icelake SP (as an example) does not contain the substring DDIO.

cxxuser commented 1 year ago

Thanks for your reply. In addition, how should I contact Intel? I asked in the Intel community, but I did not get a valuable reply.

cdahnken commented 1 year ago

Intel here, we will take a look.

cxxuser commented 1 year ago

Thanks a lot. My processors are Skylake processors and Ice Lake processors. Taking the Skylake processor as an example, document 3362714 (https://kib.kiev.ua/x86docs/Intel/PerfMon/336274-001.pdf) has described the Uncore Performance monitors. Section 2.2.4 covers Caching and Home Agent (CHA) performance monitoring and Table 2-69 has described the LLC_LOOKUP event. However, based on the description, it seems that it can only count the interaction between L2 and LLC, excluding the interaction between IIO and LLC. I can't fully understand Intel's manual, so I want to confirm whether my understanding is correct, and I wonder which counter or event can achieve my purpose. I really appreciate your help!

hservatg commented 1 year ago

Hello.

the UNC_CHA_TOR_INSERTS.IO* events count requests by opcode coming from IO into the CHA. There are HIT and MISS events that report if the data requested hit or missed the caches. For example, for Ice Lake Xeons:

ITOM is a full cacheline write issued by IO entering the CHA, ITOMCACHENEAR is a partial cacheline and PCIRDCUR is a read.

You can find further information in 3rd Gen Intel® Xeon® Processor Scalable Family, Codename Ice Lake, Uncore Performance Monitoring - Reference Manual and in PerfMon Events

Hope that helps.

TomTheBear commented 1 year ago

As remark as LIKWID uses different but related event names and I didn't add much TOR_INSERTS.IO events for ICX:

UNC_CHA_TOR_INSERTS.IO_HIT_ITOM translates to EventSel=35H UMask=04H UMaskExt=CC43FDH Counter=0,1,2,3. In order to use it with LIKWID, you have to use the GENERIC_EVENT or add them to the input event list before compilation: GENERIC_EVENT:CBOXyCz:CONFIG=0x35:UMASK=0x4:MATCH0=0xCC43FDH with z in Counter=0,1,2,3. y is the CHA box ID you want to measure. So in order to measure all, you have to add this event for each CHA box.

The event name for LIKWID would be TOR_INSERTS_IO_HIT_ITOM and so forth as CHA is already encoded by the counter names CBOXyCz.

TomTheBear commented 1 year ago

I added all TOR_INSERT.IO* I could find to the master branch.

cxxuser commented 1 year ago

My testbed is a Skylake processor (https://kib.kiev.ua/x86docs/Intel/PerfMon/336274-001.pdf)). I started a client to send packets to the server and I used the same method to monitor the LLC_LOOKUP event (event code: 0x34) and TOR_INSERTS.IO (event code: 0x35) on the server.

However, I find the value of counters for TOR_INSERTS.IO is always 0. Why is this happening? Is there any difference in methods between measuring TOR_INSERTS events and measuring other events (e.g., LLC_LOOKUP event)?

TomTheBear commented 1 year ago

The TOR_INSERTS and LLC_LOOKUP events are both the complicated ones für the Intel CHAs as they require setting additional event filters. I was assuming Intel Icelake SP, I never checked any Skylake SP code and event lists.

cxxuser commented 1 year ago

Is it necessary to set additional event filters (CHA Filter Registers, not the UmaskExt Filter) to monitor TOR_INSERTS on Intel Icelake? For both LLC_LOOKUP and TOR_INSERTS, I did not set additional event filters. But only the value of TOR_INSERTS is always 0. In my understanding, an event filter is used to further accurately measure events. Therefore, not setting the event filter should only make the measured value larger, but not 0, right?

TomTheBear commented 1 year ago

The additional event filter is "UmaskExt" (Umask Extended). Normal events just have an event ID and a umask.

LIKWID sets the filters based on its configuration. If nothing is specified for LLC_LOOKUP, it sets a default. This is not the case for TOR_INSERTS as there are events with and without UmaskExt. Although they are called filters, it's more extended configuration. For TOR_INSERTS (0x35), there exist basically two main umasks (0x1 for IA and 0x4 for IO). Which IA or IO event exactly, it specified through the UmaskExt.

Again: You are trying to use a feature that nobody (except me maybe) ever used with LIKWID or requested it. It is likely that it does not work out-of-the-box. A patch might be required. I checked it for Icelake SP but not for Skylake SP. If you want to understand it yourself, look at the code (skx_cbox_setup). There is a special handling for LLC_LOOKUP but I cannot see some special stuff for TOR_INSERTS.

TomTheBear commented 10 months ago

The TOR_INSERTS_IO events on SKX should work like this: TOR_INSERTS_IO:CBOX<box_id>C<counter_id>:OPCODE=<opcode>

For more info see SKX uncore monitoring guide. The opcodes can be found in Section 3.1.1.

cxxuser commented 9 months ago

I don't fully understand how to measure the TOR_INSERTS_IO events on SKX. Do you mean that I should directly run likwid-perfctr TOR_INSERTS_IO:CBOX<box_id>C<counter_id>:OPCODE=<opcode>? Is CBOX equivalent to CHA? In addition, if I want to measure IO events, do I not need to add the umask parameter?

TomTheBear commented 9 months ago

Do you mean that I should directly run likwid-perfctr TOR_INSERTS_IO:CBOX<box_id>C<counter_id>:OPCODE=<opcode>?

Yes, that's the common way LIKWID works. You can also create a performance group for all (max.) 28 CBOXes.

Is CBOX equivalent to CHA?

Yes, see https://github.com/RRZE-HPC/likwid/wiki/SkylakeSP#last-level-cache-counters

In addition, if I want to measure IO events, do I not need to add the umask parameter?

No, the config and umask part is defined by TOR_INSERTS_IO.

cxxuser commented 9 months ago

I have tried likwid-perfctr -g TOR_INSERTS_IO:CBOX 0 C 0,1,2,3:OPCODE=0x10000033. But this command are wrong.

CPU name:       Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
CPU type:       Intel Skylake SP processor
CPU clock:      2.09 GHz
WARN: Counter CBOX not defined for current architecture
WARN: Counter FIXC0 already used in event set, skipping
ERROR: No event in given event string can be configured.
       Either the events or counters do not exist for the
       current architecture. If event options are set, they might
       be invalid.

I think the value of opcode is right. Could you show me the right command? I'm very grateful for this.

TomTheBear commented 9 months ago

Maybe you should get familiar with LIKWID first before trying more complicated stuff? Check out the wiki.

likwid-perfctr -g TOR_INSERTS_IO:CBOX0C0:OPCODE=0x10000033 ...

You have to add the event 28 times, once for each CBOX. I did not check the opcode value. I'm confident, you found the right one in the provided documentation.

TomTheBear commented 8 months ago

If your issue is resolved, please close the issue.