RRZE-HPC / likwid

Performance monitoring and benchmarking suite
https://hpc.fau.de/research/tools/likwid/
GNU General Public License v3.0
1.63k stars 226 forks source link

[BUG] likwid-perfctr makes an error when used with only one event #591

Closed georges-da-costa closed 8 months ago

georges-da-costa commented 8 months ago

I try to use perfmon_addEventSet to monitor a single event. As I did not manage to make it work while being able to monitor two or more events, I tried with likwid-perfctr.

Similarly, likwid-perfctr makes an error when used with only one event, while it works well with the same event used along with another one. I suppose it should be able to monitor a single event.

To Reproduce

Neither sudo likwid-perfctr -g L2_TRANS_DEMAND_DATA_RD:PMC0 sleep 1 nor sudo likwid-perfctr -g L2_TRANS_RFO:PMC1 sleep 1 work, while sudo likwid-perfctr -g L2_TRANS_DEMAND_DATA_RD:PMC0,L2_TRANS_RFO:PMC1 sleep 1 works well. The error is similar in both cases:

$ sudo likwid-perfctr -g L2_TRANS_DEMAND_DATA_RD:PMC0 sleep 1
--------------------------------------------------------------------------------
CPU name:   11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
CPU type:   Intel Tigerlake processor
CPU clock:  1.38 GHz
Cannot gather values from unit register 0x606, deactivating RAPL support
ERROR: No event in given event string can be configured.
       Either the events or counters do not exist for the
       current architecture. If event options are set, they might
       be invalid.

The version of likwid is the one on Debian-SID

Package: likwid
Version: 5.2.2+dfsg1-2

If more information is needed, feel free to ask. Thanks

Full output with -V 3

$ sudo likwid-perfctr -V 3 -g L2_TRANS_DEMAND_DATA_RD:PMC0 sleep 1
DEBUG - [hwloc_init_cpuInfo:373] HWLOC CpuInfo Family 6 Model 140 Stepping 1 Vendor 0x0 Part 0x0 isIntel 1 numHWThreads 8 activeHWThreads 8
DEBUG - [hwloc_init_nodeTopology:582] HWLOC Thread Pool PU 0 Thread 0 Core 0 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:582] HWLOC Thread Pool PU 4 Thread 1 Core 0 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:582] HWLOC Thread Pool PU 1 Thread 0 Core 1 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:582] HWLOC Thread Pool PU 5 Thread 1 Core 1 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:582] HWLOC Thread Pool PU 2 Thread 0 Core 2 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:582] HWLOC Thread Pool PU 6 Thread 1 Core 2 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:582] HWLOC Thread Pool PU 3 Thread 0 Core 3 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:582] HWLOC Thread Pool PU 7 Thread 1 Core 3 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_cacheTopology:803] HWLOC Cache Pool ID 0 Level 1 Size 49152 Threads 2
DEBUG - [hwloc_init_cacheTopology:803] HWLOC Cache Pool ID 1 Level 2 Size 1310720 Threads 2
DEBUG - [hwloc_init_cacheTopology:803] HWLOC Cache Pool ID 2 Level 3 Size 8388608 Threads 8
--------------------------------------------------------------------------------
CPU name:   11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
CPU type:   Intel Tigerlake processor
CPU clock:  1.38 GHz
CPU family: 6
CPU model:  140
CPU short:  TGL
CPU stepping:   1
CPU features:   FP ACPI MMX SSE SSE2 HTT TM RDTSCP MONITOR VMX EIST TM2 SSSE FMA SSE4.1 SSE4.2 AES AVX RDRAND AVX2 AVX512 RDSEED SSE3
CPU arch:   x86_64
--------------------------------------------------------------------------------
PERFMON version:            5
PERFMON number of counters:     8
PERFMON width of counters:      48
PERFMON number of fixed counters:   4
--------------------------------------------------------------------------------
DEBUG - [affinity_init:539] Affinity: Socket domains 1
DEBUG - [affinity_init:541] Affinity: CPU die domains 1
DEBUG - [affinity_init:546] Affinity: CPU cores per LLC 4
DEBUG - [affinity_init:549] Affinity: Cache domains 1
DEBUG - [affinity_init:553] Affinity: NUMA domains 1
DEBUG - [affinity_init:554] Affinity: All domains 5
DEBUG - [affinity_addNodeDomain:370] Affinity domain N: 8 HW threads on 4 cores
DEBUG - [affinity_addSocketDomain:401] Affinity domain S0: 8 HW threads on 4 cores
DEBUG - [affinity_addDieDomain:438] Affinity domain D0: 8 HW threads on 4 cores
DEBUG - [affinity_addCacheDomain:474] Affinity domain C0: 8 HW threads on 4 cores
DEBUG - [affinity_addMemoryDomain:504] Affinity domain M0: 8 HW threads on 4 cores
DEBUG - [create_lookups:290] T 0 T2C 0 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 1 T2C 1 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 2 T2C 2 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 3 T2C 3 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 4 T2C 0 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 5 T2C 1 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 6 T2C 2 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 7 T2C 3 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [HPMinit:98] Adjusting functions for x86 architecture in daemon mode
DEBUG - [access_x86_rdpmc_init:156] Test for RDPMC for PMC counters returned 1
DEBUG - [access_x86_rdpmc_init:163] Test for RDPMC for FIXED instruction counter returned 1
DEBUG - [access_x86_rdpmc_init:171] Test for RDPMC for FIXED core cycles counter returned 1
DEBUG - [access_x86_rdpmc_init:179] Test for RDPMC for FIXED reference cycle counter returned 1
DEBUG - [access_x86_rdpmc_init:187] Test for RDPMC for FIXED slots counter returned 1
DEBUG - [access_client_startDaemon:157] Starting daemon /usr/sbin/likwid-accessD
DEBUG - [access_client_startDaemon:205] Still waiting for socket /tmp/likwid-470284 for CPU 7...
DEBUG - [access_client_startDaemon:205] Still waiting for socket /tmp/likwid-470284 for CPU 7...
DEBUG - [access_client_startDaemon:205] Still waiting for socket /tmp/likwid-470284 for CPU 7...
DEBUG - [access_client_startDaemon:205] Still waiting for socket /tmp/likwid-470284 for CPU 7...
DEBUG - [access_client_startDaemon:217] Successfully opened socket /tmp/likwid-470284 to daemon for CPU 7
DEBUG - [HPMaddThread:143] Adding CPU 7 to access module
DEBUG - [access_client_check:537] Device check for dev 5 on CPU 7 with accessDaemon failed: failed to open device file

DEBUG - [access_client_check:537] Device check for dev 5 on CPU 7 with accessDaemon failed: failed to open device file

DEBUG - [access_client_check:537] Device check for dev 5 on CPU 7 with accessDaemon failed: failed to open device file

DEBUG - [access_client_check:537] Device check for dev 5 on CPU 7 with accessDaemon failed: failed to open device file

DEBUG - [access_client_check:537] Device check for dev 5 on CPU 7 with accessDaemon failed: failed to open device file

DEBUG - [HPMaddThread:143] Adding CPU 0 to access module
DEBUG - [access_client_read:360] Got error 'access to this register is not allowed' from access daemon reading reg 0x606 at CPU 0
Cannot gather values from unit register 0x606, deactivating RAPL support
DEBUG - [HPMaddThread:143] Adding CPU 1 to access module
DEBUG - [HPMaddThread:143] Adding CPU 2 to access module
DEBUG - [HPMaddThread:143] Adding CPU 3 to access module
DEBUG - [HPMaddThread:143] Adding CPU 4 to access module
DEBUG - [HPMaddThread:143] Adding CPU 5 to access module
DEBUG - [HPMaddThread:143] Adding CPU 6 to access module
Executing: sleep 1
DEBUG - [perfmon_addEventSet:2187] Currently 1 groups of 2 active
DEBUG - [perfgroup_customGroup:631] Creating custom group for event string L2_TRANS_DEMAND_DATA_RD:PMC0
DEBUG - [perfmon_addEventSet:2366] Added event L2_TRANS_DEMAND_DATA_RD for counter PMC0 to group 0
DEBUG - [perfmon_addEventSet:2366] Added event INSTR_RETIRED_ANY for counter FIXC0 to group 0
DEBUG - [perfmon_addEventSet:2366] Added event CPU_CLK_UNHALTED_CORE for counter FIXC1 to group 0
DEBUG - [perfmon_addEventSet:2366] Added event CPU_CLK_UNHALTED_REF for counter FIXC2 to group 0
ERROR: No event in given event string can be configured.
       Either the events or counters do not exist for the
       current architecture. If event options are set, they might
       be invalid.
DEBUG - [HPMfinalize:170] Removing CPU 0 from access module
DEBUG - [HPMfinalize:170] Removing CPU 1 from access module
DEBUG - [HPMfinalize:170] Removing CPU 2 from access module
DEBUG - [HPMfinalize:170] Removing CPU 3 from access module
DEBUG - [HPMfinalize:170] Removing CPU 4 from access module
DEBUG - [HPMfinalize:170] Removing CPU 5 from access module
DEBUG - [HPMfinalize:170] Removing CPU 6 from access module
DEBUG - [HPMfinalize:170] Removing CPU 7 from access module
TomTheBear commented 8 months ago

Thanks for reporting. After testing it successfully on an Intel Cascadelake SP, I looked at the code. There is a break missing: https://github.com/RRZE-HPC/likwid/blob/master/src/perfmon.c#L1140-L1147 . Please add one, recompile and try again. I cannot test it, I have no Tigerlake system anymore.

Note: You don't need sudo with the access daemon.

georges-da-costa commented 8 months ago

Thanks, it actually seems to be missing but it is not sufficient. I also had to change https://github.com/RRZE-HPC/likwid/blob/49ec7af2e7739b2d59f1ace6433ba6643120e3a8/src/perfmon.c#L2528C1-L2528C60

from

if (((valid_events > fixed_counters) || isPerfGroup) &&

to

if (((valid_events >= fixed_counters) || isPerfGroup) &&

and now it works. I also tested it with multiple events and it still work. I can contribute a patch but I'm not sure if it adds a problem somewhere else. What is your opinion ?

TomTheBear commented 8 months ago

Can you please provide the output with -V 3 again after you just added the break? I think there is something else not working properly. With a single event, valid_events should be fixed_counters + 1, so the original check should work.

georges-da-costa commented 8 months ago

I just tried with one and two events (actually the same in two different registers), with only the break and it works only with the second one. I added the following code just before the line 2528:

printf("DEBUG#591 : valid_events:%d, fixed_counters:%d\n", valid_events, fixed_counters);

For the one event version it prints: DEBUG#591 : valid_events:4, fixed_counters:4

For the two events version it prints: DEBUG#591 : valid_events:5, fixed_counters:4

You can find the complete version below for both execution:

one_event.txt

two_events.txt

TomTheBear commented 8 months ago

But with a single event, it should be valid_events=5. The issue is caused by the fourth available fixed counter being not added for Tigerlake: https://github.com/RRZE-HPC/likwid/blob/master/src/perfgroup.c#L778 . Add it to the condition (cpuid_info.model == TIGERLAKE1 || cpuid_info.model == TIGERLAKE2) and the valid_events check should work in its original version.