RRZE-HPC / likwid

Performance monitoring and benchmarking suite
https://hpc.fau.de/research/tools/likwid/
GNU General Public License v3.0
1.66k stars 227 forks source link

[BUG] Segmentation fault while reading bandwidth in Ice Lake after reboot #469

Open uenomad opened 2 years ago

uenomad commented 2 years ago

Describe the bug Segmentation fault while reading bandwidth in Ice Lake node 8352Y after reboot.

To Reproduce Is compiled with ACCESSMODE = accessdaemon or direct. If I switch to perf_event, it will run perfectly. Moreover, going back to accessdaemon or direct will be fixed until rebooting the node. Is it possible I'm missing some initialization?

Additional context GDB output:

0x000015554ae847f3 in __rdpmc (cpu_id=0, counter=1073741827, value=0x1555547bab90) at ./src/access_x86_rdpmc.c:75
75  ./src/access_x86_rdpmc.c: No such file or directory.
(gdb) bt
#0  0x000015554ae847f3 in __rdpmc (cpu_id=0, counter=1073741827, value=0x1555547bab90) at ./src/access_x86_rdpmc.c:75
#1  0x000015554ae8549a in access_x86_rdpmc_read (cpu_id=0, reg=780, data=0x1555547bab90) at ./src/access_x86_rdpmc.c:322
#2  0x000015554af37331 in access_x86_msr_read (cpu_id=0, reg=780, data=0x1555547bab90) at ./src/access_x86_msr.c:204
#3  0x000015554ae81a71 in access_x86_read (dev=MSR_DEV, cpu_id=0, reg=780, data=0x1555547babd0) at ./src/access_x86.c:83
#4  0x000015554af4b9f2 in HPMread (cpu_id=0, dev=MSR_DEV, reg=780, data=0x1555547bac28) at ./src/access.c:211
#5  0x000015554af0b5c1 in perfmon_readCountersThread_icelake (thread_id=0, eventSet=0x15554c0d9480) at ./src/includes/perfmon_icelake.h:1735
#6  0x000015554af1606b in __perfmon_readCounters (groupId=0, threadId=0) at ./src/perfmon.c:2662
#7  0x000015554af165b7 in perfmon_readCounters () at ./src/perfmon.c:2703
#8  0x000000000040e9d0 in likwid_events_read (id=0x63d020 <events>, ctrs=0x15554c0fba70) at likwid.c:231

We load likwid library using dlopen, and the "likwid." is the struct containing the functions. Our initialization code:

    setenv("LIKWID_FORCE", "1", 1);
    appenv("PATH", LIKWID_PATH "/sbin");
    //likwid.perfmon_setVerbosity(1);
    if (likwid.topology_init() < 0) {
        return -1;
    }
    info = likwid.get_cpuInfo();
    topo = likwid.get_cpuTopology();
    if((cpus = (int *) calloc(topo->numHWThreads, sizeof(int))) == NULL) {
        return -1;
    }
    for (i = 0; i < topo->numHWThreads; i++) {
        cpus[i] = topo->threadPool[i].apicId;
    }
    if (likwid.perfmon_init(topo->numHWThreads, cpus) < 0) {
        return -1;
    }

Opening events:

static char *evs_names = \
                           "CAS_COUNT_ALL:MBOX0C0," \
                           "CAS_COUNT_ALL:MBOX1C0," \
                           "CAS_COUNT_ALL:MBOX2C0," \
                           "CAS_COUNT_ALL:MBOX3C0," \
                           "CAS_COUNT_ALL:MBOX4C0," \
                           "CAS_COUNT_ALL:MBOX5C0," \
                           "CAS_COUNT_ALL:MBOX6C0," \
                           "CAS_COUNT_ALL:MBOX7C0";
    if ((id->gid = likwid.perfmon_addEventSet(evs_names)) < 0) {
        return -1;
    }
    if (likwid.perfmon_setupCounters(id->gid) < 0) {
        return -1;
    }
    if (likwid.perfmon_startCounters() < 0) {
        return -1;
    }

And the reading:

    if (likwid.perfmon_readCounters() < 0) {
        return -1;
    }
    for (e = j = 0; e < id->evs_count; e++) {
        for (i = 0; i < topo->numHWThreads; i++, j++) {
            ctrs[j] = likwid.perfmon_getLastResult(id->gid, e, i);
        }
    }
TomTheBear commented 2 years ago

There seem to be a cascade of problems here. When you add a raw event string instead of a performance group, the LIKWID library adds the fixed counters at the end of the event string depending on the architecture. Intel Icelake introduces a new fixed counter TOPDOWN_SLOTS:FIXC3 (register 0x30C = access_x86_rdpmc_read (..., reg=780, ...)). Since this is a MSR which can be read throught the RDPMC instruction, it uses it as a faster access method for the read. This instruction seems to fails on your system. It might be that the first perf usage activates the rdpmc instruction (it's just a MSR bit flip but it has to be done in kernel-space). Normally, LIKWID checks the usability of the instruction and otherwise falls back to the rdmsr way. This does not seem to work for you.

When I got my hands on an Icelake SP system the first time, I experienced that the RDPMC instruction failed for the FIXC3 counter, so exactly your problem. I try to investigate that.

Can you please supply some further information:

$ grep -E "model|family|stepping" /proc/cpuinfo | sort -u
$ likwid-perfctr -C 0 -g TOPDOWN_SLOTS:FIXC3 -V 3 hostname
$ perf ....
$ likwid-perfctr -C 0 -g TOPDOWN_SLOTS:FIXC3 -V 3 hostname

Use your "fixing perf command" here or install LIKWID in two different folders with accessdaemon and perf_event mode. I try to spot the difference in the LIKWID execution. If it works for likwid-perfctr, we have to check your code but at the moment, I don't see a problem with your approach.

uenomad commented 2 years ago

I can't reboot the node now. I hope is enough. If not, I will come back when the node is rebooted.

# grep -E "model|family|stepping" /proc/cpuinfo | sort -u
cpu family  : 6
model       : 106
model name  : Intel(R) Xeon(R) Platinum 8352Y CPU @ 2.20GHz
stepping    : 6

From Access Daemon installation:

# ./likwid-perfctr -C 0 -g TOPDOWN_SLOTS:FIXC3 -V 3 hostname
--------------------------------------------------------------------------------
CPU name:   Intel(R) Xeon(R) Platinum 8352Y CPU @ 2.20GHz
CPU type:   Intel Icelake SP processor
CPU clock:  2.19 GHz
CPU family: 6
CPU model:  106
CPU short:  ICX
CPU stepping:   6
CPU features:   FP ACPI MMX SSE SSE2 HTT TM RDTSCP VMX EIST TM2 SSSE FMA SSE4.1 SSE4.2 AES AVX RDRAND HLE AVX2 AVX512 RDSEED SSE3 
CPU arch:   x86_64
--------------------------------------------------------------------------------
PERFMON version:            5
PERFMON number of counters:     8
PERFMON width of counters:      48
PERFMON number of fixed counters:   4
--------------------------------------------------------------------------------
[likwid-pin] Main PID -> hwthread 0 - OK
DEBUG - [HPMinit:98] Adjusting functions for x86 architecture in daemon mode
DEBUG - [access_x86_rdpmc_init:152] Test for RDPMC for PMC counters returned 1
DEBUG - [access_x86_rdpmc_init:158] Test for RDPMC for FIXED instruction counter returned -1
DEBUG - [access_x86_rdpmc_init:164] Test for RDPMC for FIXED core cycles counter returned -1
DEBUG - [access_x86_rdpmc_init:170] Test for RDPMC for FIXED reference cycle counter returned -1
DEBUG - [access_x86_rdpmc_init:175] Test for RDPMC for FIXED slots counter returned -1
DEBUG - [access_client_startDaemon:141] Starting daemon /hpc/opt/likwid-multi/likwid-ad/sbin/likwid-accessD
DEBUG - [access_client_startDaemon:189] Still waiting for socket /tmp/likwid-30207 for CPU 0...
DEBUG - [access_client_startDaemon:189] Still waiting for socket /tmp/likwid-30207 for CPU 0...
DEBUG - [access_client_startDaemon:189] Still waiting for socket /tmp/likwid-30207 for CPU 0...
DEBUG - [access_client_startDaemon:189] Still waiting for socket /tmp/likwid-30207 for CPU 0...
DEBUG - [access_client_startDaemon:189] Still waiting for socket /tmp/likwid-30207 for CPU 0...
DEBUG - [access_client_startDaemon:189] Still waiting for socket /tmp/likwid-30207 for CPU 0...
DEBUG - [access_client_startDaemon:189] Still waiting for socket /tmp/likwid-30207 for CPU 0...
DEBUG - [access_client_startDaemon:189] Still waiting for socket /tmp/likwid-30207 for CPU 0...
DEBUG - [access_client_startDaemon:201] Successfully opened socket /tmp/likwid-30207 to daemon for CPU 0
DEBUG - [HPMaddThread:143] Adding CPU 0 to access module
DEBUG - [access_client_read:345] Got error 'failed to read/write register' from access daemon reading reg 0x63A at CPU 0
DEBUG - [power_init:414] Deactivating policy register for RAPL domain PP0
DEBUG - [access_client_read:345] Got error 'failed to read/write register' from access daemon reading reg 0x641 at CPU 0
DEBUG - [power_init:367] RAPL domain PP1 not supported
DEBUG - [access_client_read:345] Got error 'failed to read/write register' from access daemon reading reg 0x65C at CPU 0
DEBUG - [power_init:379] Deactivating limit register for RAPL domain PLATFORM
Executing: hostname
DEBUG - [perfmon_addEventSet:2181] Currently 1 groups of 2 active
DEBUG - [perfgroup_customGroup:630] Creating custom group for event string TOPDOWN_SLOTS:FIXC3
DEBUG - [perfmon_addEventSet:2362] Added event TOPDOWN_SLOTS for counter FIXC3 to group 0
DEBUG - [perfmon_addEventSet:2362] Added event INSTR_RETIRED_ANY for counter FIXC0 to group 0
DEBUG - [perfmon_addEventSet:2362] Added event CPU_CLK_UNHALTED_CORE for counter FIXC1 to group 0
DEBUG - [perfmon_addEventSet:2362] Added event CPU_CLK_UNHALTED_REF for counter FIXC2 to group 0
DEBUG - [perfmon_setupCounterThread_icelake:717] FREEZE_PMC_AND_FIXED [0] Register 0x38F , Flags: 0x0 
DEBUG - [perfmon_setupCounterThread_icelake:922] SETUP_FIXED [0] Register 0x38D , Flags: 0x2222 
--------------------------------------------------------------------------------
DEBUG - [perfmon_startCountersThread_icelake:961] CLEAR_FIXED [0] Device 0 Register 0x30C , Flags: 0x0 
DEBUG - [perfmon_startCountersThread_icelake:961] CLEAR_FIXED [0] Device 0 Register 0x309 , Flags: 0x0 
DEBUG - [perfmon_startCountersThread_icelake:961] CLEAR_FIXED [0] Device 0 Register 0x30A , Flags: 0x0 
DEBUG - [perfmon_startCountersThread_icelake:961] CLEAR_FIXED [0] Device 0 Register 0x30B , Flags: 0x0 
DEBUG - [perfmon_startCountersThread_icelake:1232] CLEAR_PMC_AND_FIXED_OVERFLOW [0] Register 0x390 , Flags: 0xC000000F00000000 
DEBUG - [perfmon_startCountersThread_icelake:1234] UNFREEZE_PMC_AND_FIXED [0] Register 0x38F , Flags: 0xF00000000 
ice2760
DEBUG - [perfmon_stopCountersThread_icelake:1370] FREEZE_PMC_AND_FIXED [0] Register 0x38F , Flags: 0x0 
DEBUG - [access_x86_rdpmc_read:321] Read FIXED slots counter with RDPMC instruction with index 0x40000003
DEBUG - [perfmon_stopCountersThread_icelake:1410] STOP_FIXED [0] Register 0x30C , Flags: 0x189F98 
DEBUG - [perfmon_stopCountersThread_icelake:1410] STOP_FIXED [0] Register 0x309 , Flags: 0x3B27B 
DEBUG - [perfmon_stopCountersThread_icelake:1410] STOP_FIXED [0] Register 0x30A , Flags: 0x4ECBC 
DEBUG - [perfmon_stopCountersThread_icelake:1410] STOP_FIXED [0] Register 0x30B , Flags: 0x3DF60 
--------------------------------------------------------------------------------
Group 1: Custom
+-----------------------+---------+--------------+
|         Event         | Counter |  HWThread 0  |
+-----------------------+---------+--------------+
|  Runtime (RDTSC) [s]  |   TSC   | 5.093438e-04 |
|     TOPDOWN_SLOTS     |  FIXC3  |      1613720 |
|   INSTR_RETIRED_ANY   |  FIXC0  |       242299 |
| CPU_CLK_UNHALTED_CORE |  FIXC1  |       322748 |
|  CPU_CLK_UNHALTED_REF |  FIXC2  |       253792 |
+-----------------------+---------+--------------+

DEBUG - [perfmon_finalizeCountersThread_icelake:2116] SHOW_CTL [0] Device 0 Register 0x38D , Flags: 0x2222 
DEBUG - [perfmon_finalizeCountersThread_icelake:2118] CLEAR_CTL [0] Device 0 Register 0x38D , Flags: 0x0 
DEBUG - [perfmon_finalizeCountersThread_icelake:2124] CLEAR_CTR [0] Device 0 Register 0x30C , Flags: 0x0 
DEBUG - [perfmon_finalizeCountersThread_icelake:2116] SHOW_CTL [0] Device 0 Register 0x38D , Flags: 0x0 
DEBUG - [perfmon_finalizeCountersThread_icelake:2118] CLEAR_CTL [0] Device 0 Register 0x38D , Flags: 0x0 
DEBUG - [perfmon_finalizeCountersThread_icelake:2124] CLEAR_CTR [0] Device 0 Register 0x309 , Flags: 0x0 
DEBUG - [perfmon_finalizeCountersThread_icelake:2116] SHOW_CTL [0] Device 0 Register 0x38D , Flags: 0x0 
DEBUG - [perfmon_finalizeCountersThread_icelake:2118] CLEAR_CTL [0] Device 0 Register 0x38D , Flags: 0x0 
DEBUG - [perfmon_finalizeCountersThread_icelake:2124] CLEAR_CTR [0] Device 0 Register 0x30A , Flags: 0x0 
DEBUG - [perfmon_finalizeCountersThread_icelake:2116] SHOW_CTL [0] Device 0 Register 0x38D , Flags: 0x0 
DEBUG - [perfmon_finalizeCountersThread_icelake:2118] CLEAR_CTL [0] Device 0 Register 0x38D , Flags: 0x0 
DEBUG - [perfmon_finalizeCountersThread_icelake:2124] CLEAR_CTR [0] Device 0 Register 0x30B , Flags: 0x0 
DEBUG - [perfmon_finalizeCountersThread_icelake:2131] CLEAR_GLOBAL_OVF [0] Register 0x390 , Flags: 0xC000000F00000000 
DEBUG - [perfmon_finalizeCountersThread_icelake:2133] CLEAR_GLOBAL_CTRL [0] Register 0x38F , Flags: 0x0 
DEBUG - [HPMfinalize:170] Removing CPU 0 from access module

From Perf Event installation:

# ./likwid-perfctr -C 0 -g TOPDOWN_SLOTS:FIXC3 -V 3 hostname
--------------------------------------------------------------------------------
CPU name:   Intel(R) Xeon(R) Platinum 8352Y CPU @ 2.20GHz
CPU type:   Intel Icelake SP processor
CPU clock:  2.19 GHz
CPU family: 6
CPU model:  106
CPU short:  ICX
CPU stepping:   6
CPU features:   FP ACPI MMX SSE SSE2 HTT TM RDTSCP VMX EIST TM2 SSSE FMA SSE4.1 SSE4.2 AES AVX RDRAND HLE AVX2 AVX512 RDSEED SSE3 
CPU arch:   x86_64
--------------------------------------------------------------------------------
PERFMON version:            5
PERFMON number of counters:     8
PERFMON width of counters:      48
PERFMON number of fixed counters:   4
--------------------------------------------------------------------------------
[likwid-pin] Main PID -> hwthread 0 - OK
Executing: hostname
DEBUG - [perfmon_addEventSet:2181] Currently 1 groups of 2 active
DEBUG - [perfgroup_customGroup:630] Creating custom group for event string TOPDOWN_SLOTS:FIXC3
DEBUG - [perfmon_addEventSet:2362] Added event TOPDOWN_SLOTS for counter FIXC3 to group 0
DEBUG - [perfmon_addEventSet:2362] Added event INSTR_RETIRED_ANY for counter FIXC0 to group 0
DEBUG - [perfmon_addEventSet:2362] Added event CPU_CLK_UNHALTED_CORE for counter FIXC1 to group 0
DEBUG - [perfmon_addEventSet:2362] Added event CPU_CLK_UNHALTED_REF for counter FIXC2 to group 0
DEBUG - [perfmon_setupCountersThread_perfevent:691] SETUP_FIXED [0] Register 0x0 , Flags: 0x1 
DEBUG - [perfmon_setupCountersThread_perfevent:853] perf_event_open: cpu_id=0 pid=30342 flags=0
DEBUG - [perfmon_setupCountersThread_perfevent:691] SETUP_FIXED [0] Register 0x1 , Flags: 0x0 
DEBUG - [perfmon_setupCountersThread_perfevent:853] perf_event_open: cpu_id=0 pid=30342 flags=0
DEBUG - [perfmon_setupCountersThread_perfevent:691] SETUP_FIXED [0] Register 0x2 , Flags: 0x9 
DEBUG - [perfmon_setupCountersThread_perfevent:853] perf_event_open: cpu_id=0 pid=30342 flags=0
--------------------------------------------------------------------------------
DEBUG - [perfmon_startCountersThread_perfevent:909] RESET_COUNTER [0] Register 0x0 , Flags: 0x0 
DEBUG - [perfmon_startCountersThread_perfevent:922] START_COUNTER [0] Register 0x0 , Flags: 0x0 
DEBUG - [perfmon_startCountersThread_perfevent:909] RESET_COUNTER [0] Register 0x0 , Flags: 0x0 
DEBUG - [perfmon_startCountersThread_perfevent:922] START_COUNTER [0] Register 0x0 , Flags: 0x0 
DEBUG - [perfmon_startCountersThread_perfevent:909] RESET_COUNTER [0] Register 0x0 , Flags: 0x0 
DEBUG - [perfmon_startCountersThread_perfevent:922] START_COUNTER [0] Register 0x0 , Flags: 0x0 
ice2760
DEBUG - [perfmon_stopCountersThread_perfevent:945] FREEZE_COUNTER [0] Register 0x3 , Flags: 0x0 
DEBUG - [perfmon_stopCountersThread_perfevent:949] READ_COUNTER [0] Register 0x3 , Flags: 0x39370 
DEBUG - [perfmon_stopCountersThread_perfevent:974] RESET_COUNTER [0] Register 0x3 , Flags: 0x0 
DEBUG - [perfmon_stopCountersThread_perfevent:945] FREEZE_COUNTER [0] Register 0x4 , Flags: 0x0 
DEBUG - [perfmon_stopCountersThread_perfevent:949] READ_COUNTER [0] Register 0x4 , Flags: 0x49344 
DEBUG - [perfmon_stopCountersThread_perfevent:974] RESET_COUNTER [0] Register 0x4 , Flags: 0x0 
DEBUG - [perfmon_stopCountersThread_perfevent:945] FREEZE_COUNTER [0] Register 0x5 , Flags: 0x0 
DEBUG - [perfmon_stopCountersThread_perfevent:949] READ_COUNTER [0] Register 0x5 , Flags: 0x397E0 
DEBUG - [perfmon_stopCountersThread_perfevent:974] RESET_COUNTER [0] Register 0x5 , Flags: 0x0 
--------------------------------------------------------------------------------
Group 1: Custom
+-----------------------+---------+--------------+
|         Event         | Counter |  HWThread 0  |
+-----------------------+---------+--------------+
|  Runtime (RDTSC) [s]  |   TSC   | 7.135542e-04 |
|     TOPDOWN_SLOTS     |  FIXC3  |            0 |
|   INSTR_RETIRED_ANY   |  FIXC0  |       234352 |
| CPU_CLK_UNHALTED_CORE |  FIXC1  |       299844 |
|  CPU_CLK_UNHALTED_REF |  FIXC2  |       235488 |
+-----------------------+---------+--------------+
TomTheBear commented 2 years ago

Here I see two problems in den different backends:

So, I don't know why it works if you use perf_event once. Either is generally enables RDPMC and does not deactivate it again (I would expect the kernel to deactivate it) or it is a different effect.

TomTheBear commented 2 years ago

I was able to reproduce it on one of our Icelake nodes.

TomTheBear commented 2 years ago

TOPDOWN_SLOTS for perf_event: https://github.com/RRZE-HPC/likwid/commit/4fff94952f872d40c215a1c9dcc00381f6a658cc

TomTheBear commented 2 years ago

Fixes for RDPMC: https://github.com/RRZE-HPC/likwid/commit/18cc5c31e04d8acafb940cd3c0e5158e73bbe851 and https://github.com/RRZE-HPC/likwid/commit/1b6daa218067fb58c15a18dc3b7e804dc163bb34

TomTheBear commented 2 years ago

This does not solve the problems with failing RDPMC instructions after reboot. All possible checks are passed but we get an illegal instruction. The real check would be the status of the CR4.PCE bit, but the CR4 register is only readable in kernel-space.