bsc-performance-tools / extrae

Instrumentation framework to generate execution traces of the most used parallel runtimes.
https://tools.bsc.es/extrae
GNU Lesser General Public License v2.1
62 stars 40 forks source link

libomptrace segfaults when activating papi counters #125

Open lucjaulmes opened 1 day ago

lucjaulmes commented 1 day ago

Running Ubuntu 23.10 kernel 6.5.0-44 on Intel Xeon Gold 6230R (cascade lake), I’ve compiled extrae 4.2.3 and linked against apt-provided libpapi 7.0 as well as self-compiled libpapi 7.1, using gcc 13.2.0 and libgomp.

I’ve tested that:

Note that it doesn’t fail on single thread executions but fails as soon as 2 threads appear. The segfault appears to happen inside ioctl() which is called by PAPI_add_event internals, and the offending address is stack pointer ($rsp) - 8.

Would appreciate any help you can give me in debugging/avoiding this crash.


Here’s Extrae’s configure summary:

Package configuration for Extrae 4.2.3
-----------------------
Installation prefix: /home/ljaulmes/.local
Cross compilation: no
CC: gcc
CXX: g++
Binary type: 64 bits

MPI instrumentation: no
GASPI instrumentation: no
OpenMP instrumentation: yes, through LD_PRELOAD
    GNU OpenMP: yes
    IBM OpenMP: no
    Intel OpenMP: yes
    OMPT: yes
OpenSHMEM instrumentation: no
pThread instrumentation: yes
    Support for pthread_barrier_wait: yes
    Support for pthread_cond_* calls: yes
CUDA instrumentation: no
OpenCL instrumentation: no
OPENACC instrumentation: no
Java instrumentation: unsupported

Performance counters: yes
    Performance API: PAPI
    PAPI home: /usr
    Sampling support: yes

PEBS sampling: yes

libbfd available: yes (/usr/lib/x86_64-linux-gnu)
libiberty available: yes (/usr/lib/x86_64-linux-gnu)
zlib available: yes (/usr/local)
libxml2 available: yes (/usr)
BOOST available: no
callstack access: through libunwind (/usr)

Dynamic instrumentation: no

Optional features:
------------------
On-line analysis: no
Clock routine: POSIX / clock_gettime, but don't need to link against posix clock library explicitly
Heterogeneous support: no
Parallel merge: not available as MPI is not given
``̀`

Here’s a simple example:
``̀`c
#include <stdio.h>
#include <omp.h>

int main(void) {
    #pragma omp parallel
    {
        int thread_id = omp_get_thread_num();
        printf("Hello from process: %d\n", thread_id);
    }

    return 0;
}

config file:

<?xml version='1.0'?>

<trace enabled="yes"
 home="$EXTRAE_HOME$"
 initial-mode="detail"
 type="paraver"
>

  <openmp enabled="yes" ompt="no">
  </openmp>

  <counters enabled="yes">
    <cpu enabled="yes" starting-set-distribution="1">
                <set enabled="yes" domain="all">
                        PAPI_TOT_CYC
                </set>
    </cpu>
  </counters>

</trace>

What I run:

gcc -fopenmp    test.c   -o test
env OMP_NUM_THREADS=${THREADS} EXTRAE_HOME=${EXTRAE_HOME} LD_PRELOAD=${EXTRAE_HOME}/lib/libomptrace.so EXTRAE_CONFIG_FILE=${PWD}/extrae.xml ./test

Output:

Extrae: WARNING! omp_get_thread_num_real is a NULL pointer. Did the initialization of this module trigger? Retrying initialization...
Welcome to Extrae 4.2.3
Extrae: Detected GOMP version is 4.5
Extrae: Detected and hooked OpenMP runtime: [GNU GOMP]
Extrae: OMP_NUM_THREADS set to 2
Extrae: Parsing the configuration file (/home/ljaulmes/tests/openmp/extrae.xml) begins
Extrae: Tracing package is located on /home/ljaulmes/.local
Extrae: Generating intermediate files for Paraver traces.
Extrae: PAPI domain set to ALL for HWC set 1
Extrae: HWC set 1 contains following counters < PAPI_TOT_CYC (0x8000003b) > - never changes
Extrae: Dynamic memory instrumentation is disabled.
Extrae: Basic I/O memory instrumentation is disabled.
Extrae: System calls instrumentation is disabled.
Extrae: Parsing the configuration file (/home/ljaulmes/tests/openmp/extrae.xml) has ended
Extrae: Intermediate traces will be stored in /home/ljaulmes/tests/openmp
Extrae: Tracing mode is set to: Detail.
Extrae: Successfully initiated with 1 tasks and 2 threads

Hello from process: 0
Segmentation fault (core dumped)

This is the gdb info:

Thread 2 "test" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe52f96c0 (LWP 3729330)]
0x00007ffff7f16895 in check_scheduability (ctl=ctl@entry=0x7fffe0000d80, idx=idx@entry=0, ctx=<optimized out>) at components/perf_event/perf_event.c:526
526    retval = ioctl( group_leader_fd, PERF_EVENT_IOC_RESET, 0);
(gdb) bt
#0  0x00007ffff7f16895 in check_scheduability (ctl=ctl@entry=0x7fffe0000d80, idx=idx@entry=0, ctx=<optimized out>) at components/perf_event/perf_event.c:526
#1  0x00007ffff7f16de3 in open_pe_events (ctl=0x7fffe0000d80, ctx=<optimized out>) at components/perf_event/perf_event.c:814
#2  _pe_update_control_state (ctl=0x7fffe0000d80, native=<optimized out>, count=<optimized out>, ctx=<optimized out>) at components/perf_event/perf_event.c:1660
#3  0x00007ffff7f02d73 in add_native_events (ESI=ESI@entry=0x7fffe0000c80, nevt=nevt@entry=0x7ffff7f53448 <_papi_hwi_presets+11848>, size=<optimized out>, 
    out=0x7fffe00151e0) at papi_internal.c:1297
#4  0x00007ffff7f04d41 in _papi_hwi_add_event (ESI=0x7fffe0000c80, EventCode=<optimized out>, EventCode@entry=-2147483589) at papi_internal.c:1415
#5  0x00007ffff7efd49d in PAPI_add_event (EventSet=<optimized out>, EventCode=-2147483589) at papi.c:2243
#6  0x00007ffff766b572 in HWCBE_PAPI_Init_Thread (time=5035646768046324, threadid=threadid@entry=1, forked=forked@entry=0)
    at ../../../../src/tracer/hwc/papi_hwc.c:612
#7  0x00007ffff7669394 in HWC_Read (tid=tid@entry=1, time=<optimized out>, store_buffer=store_buffer@entry=0x7fffe4af8b48)
    at ../../../../src/tracer/hwc/common_hwc.c:574
#8  0x00007ffff767c625 in Probe_OpenMP_UF_Entry (uf=uf@entry=93824992235958) at ../../../../../src/tracer/wrappers/OMP/omp-probe.c:88
#9  0x00007ffff7681a07 in Extrae_OpenMP_UF_Entry (uf=0x5555555551b6 <main._omp_fn>) at ../../../../../src/tracer/wrappers/OMP/omp-events.c:63
#10 0x00007ffff768238a in callme_par (par_helper_ptr=par_helper_ptr@entry=0x7fffffffe6b0) at ../../../../../src/tracer/wrappers/OMP/gnu-libgomp.c:448
#11 0x00007ffff7f8248e in gomp_thread_start (xdata=<optimized out>) at ../../../src/libgomp/team.c:129
#12 0x00007ffff7297b5a in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
#13 0x00007ffff73285fc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
(gdb) info registers
rax            0x0                 0
rbx            0xa                 10
rcx            0x7ffff7325cfd      140737340660989
rdx            0x0                 0
rsi            0x2403              9219
rdi            0xa                 10
rbp            0x7fffe0000de8      0x7fffe0000de8
rsp            0x7fffe4af6f70      0x7fffe4af6f70
r8             0x0                 0
r9             0x0                 0
r10            0xffffffff          4294967295
r11            0x246               582
r12            0x0                 0
r13            0x7fffe0000d80      140736951487872
r14            0x0                 0
r15            0x1                 1
rip            0x7ffff7f16895      0x7ffff7f16895 <check_scheduability+85>
eflags         0x10246             [ PF ZF IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
k0             0x1c00              7168
k1             0x0                 0
k2             0xefffffff          4026531839
k3             0x0                 0
k4             0x0                 0
k5             0x0                 0
k6             0xb80170            12058992
k7             0x0                 0
fs_base        0x7fffe52f96c0      140737038489280
gs_base        0x0                 0
(gdb) p $_siginfo
$1 = {si_signo = 11, si_errno = 0, si_code = 2, _sifields = {_pad = {-458264728, 32767, 0 <repeats 26 times>}, _kill = {si_pid = -458264728, si_uid = 32767}, 
    _timer = {si_tid = -458264728, si_overrun = 32767, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _rt = {si_pid = -458264728, si_uid = 32767, si_sigval = {
        sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = -458264728, si_uid = 32767, si_status = 0, si_utime = 0, si_stime = 0}, _sigfault = {
      si_addr = 0x7fffe4af6f68, _addr_lsb = 0, _addr_bnd = {_lower = 0x0, _upper = 0x0}}, _sigpoll = {si_band = 140737030090600, si_fd = 0}, _sigsys = {
      _call_addr = 0x7fffe4af6f68, _syscall = 0, _arch = 0}}}
(gdb) p $_siginfo.si_signo
$5 = 11
(gdb) p $_siginfo.si_errno
$6 = 0
(gdb) p $_siginfo._sifields._sigfault 
$7 = {si_addr = 0x7fffe4af6f68, _addr_lsb = 0, _addr_bnd = {_lower = 0x0, _upper = 0x0}}

Let me know any other info you need.

lucjaulmes commented 10 hours ago

I tried Extrae 3.8.3 which did not have a crash, so went ahead and ran a git bisect. First bad commit appears to be 0df3e97819f3f008aba4da4bd8495a720ab035b4. Reverting that commit on top of v4.2.3 fixes the issue.