Running Ubuntu 23.10 kernel 6.5.0-44 on Intel Xeon Gold 6230R (cascade lake), I’ve compiled extrae 4.2.3 and linked against apt-provided libpapi 7.0 as well as self-compiled libpapi 7.1, using gcc 13.2.0 and libgomp.
I’ve tested that:
this happens on all tested and even trivial openmp programs
libseqtrace seems to not have the same issue and creates a trace with counter values
papi works fine on its own, without instrumentation
Note that it doesn’t fail on single thread executions but fails as soon as 2 threads appear. The segfault appears to happen inside ioctl() which is called by PAPI_add_event internals, and the offending address is stack pointer ($rsp) - 8.
Would appreciate any help you can give me in debugging/avoiding this crash.
Here’s Extrae’s configure summary:
Package configuration for Extrae 4.2.3
-----------------------
Installation prefix: /home/ljaulmes/.local
Cross compilation: no
CC: gcc
CXX: g++
Binary type: 64 bits
MPI instrumentation: no
GASPI instrumentation: no
OpenMP instrumentation: yes, through LD_PRELOAD
GNU OpenMP: yes
IBM OpenMP: no
Intel OpenMP: yes
OMPT: yes
OpenSHMEM instrumentation: no
pThread instrumentation: yes
Support for pthread_barrier_wait: yes
Support for pthread_cond_* calls: yes
CUDA instrumentation: no
OpenCL instrumentation: no
OPENACC instrumentation: no
Java instrumentation: unsupported
Performance counters: yes
Performance API: PAPI
PAPI home: /usr
Sampling support: yes
PEBS sampling: yes
libbfd available: yes (/usr/lib/x86_64-linux-gnu)
libiberty available: yes (/usr/lib/x86_64-linux-gnu)
zlib available: yes (/usr/local)
libxml2 available: yes (/usr)
BOOST available: no
callstack access: through libunwind (/usr)
Dynamic instrumentation: no
Optional features:
------------------
On-line analysis: no
Clock routine: POSIX / clock_gettime, but don't need to link against posix clock library explicitly
Heterogeneous support: no
Parallel merge: not available as MPI is not given
``̀`
Here’s a simple example:
``̀`c
#include <stdio.h>
#include <omp.h>
int main(void) {
#pragma omp parallel
{
int thread_id = omp_get_thread_num();
printf("Hello from process: %d\n", thread_id);
}
return 0;
}
Extrae: WARNING! omp_get_thread_num_real is a NULL pointer. Did the initialization of this module trigger? Retrying initialization...
Welcome to Extrae 4.2.3
Extrae: Detected GOMP version is 4.5
Extrae: Detected and hooked OpenMP runtime: [GNU GOMP]
Extrae: OMP_NUM_THREADS set to 2
Extrae: Parsing the configuration file (/home/ljaulmes/tests/openmp/extrae.xml) begins
Extrae: Tracing package is located on /home/ljaulmes/.local
Extrae: Generating intermediate files for Paraver traces.
Extrae: PAPI domain set to ALL for HWC set 1
Extrae: HWC set 1 contains following counters < PAPI_TOT_CYC (0x8000003b) > - never changes
Extrae: Dynamic memory instrumentation is disabled.
Extrae: Basic I/O memory instrumentation is disabled.
Extrae: System calls instrumentation is disabled.
Extrae: Parsing the configuration file (/home/ljaulmes/tests/openmp/extrae.xml) has ended
Extrae: Intermediate traces will be stored in /home/ljaulmes/tests/openmp
Extrae: Tracing mode is set to: Detail.
Extrae: Successfully initiated with 1 tasks and 2 threads
Hello from process: 0
Segmentation fault (core dumped)
I tried Extrae 3.8.3 which did not have a crash, so went ahead and ran a git bisect. First bad commit appears to be 0df3e97819f3f008aba4da4bd8495a720ab035b4. Reverting that commit on top of v4.2.3 fixes the issue.
Running Ubuntu 23.10 kernel 6.5.0-44 on Intel Xeon Gold 6230R (cascade lake), I’ve compiled extrae 4.2.3 and linked against apt-provided libpapi 7.0 as well as self-compiled libpapi 7.1, using gcc 13.2.0 and libgomp.
I’ve tested that:
Note that it doesn’t fail on single thread executions but fails as soon as 2 threads appear. The segfault appears to happen inside
ioctl()
which is called byPAPI_add_event
internals, and the offending address is stack pointer ($rsp
) - 8.Would appreciate any help you can give me in debugging/avoiding this crash.
Here’s Extrae’s configure summary:
config file:
What I run:
Output:
This is the gdb info:
Let me know any other info you need.