bsc-performance-tools / extrae

Instrumentation framework to generate execution traces of the most used parallel runtimes.
https://tools.bsc.es/extrae
GNU Lesser General Public License v2.1
62 stars 38 forks source link

Fix for data races when threads are dynamically created #49

Open jklinkenberg opened 3 years ago

jklinkenberg commented 3 years ago

Problem Description

I discovered data race issues during one of my audits of the EU project POP2 using Extrae. The code I used when I ran into those issues was using Intel Thread Building Block (TBB) where threads are sometimes dynamically created.

Looking a bit deeper into Extrae, I noticed that data structures for tracing events are created for each thread which is good. However, the global management data structures that point to those data structures are reallocated when the number of threads changes. This process is not thread-safe.

Reproducer

Attached I also created a small reproducer for the problem. The reproducer creates a pthread that is then dynamically creating more pthreads whereas the master thread is constantly spaming MPI_IProbes. Thus, there is constant access to tracing data structures conflicting with resizing data structures when the number of threads changes.

extrae-datarace-reproducer.zip

With the current Extrae version this results in the following seg fault:

Caught signal 11 (Segmentation fault: address not mapped to object at address 0x9)
==== backtrace ====
    0  /usr/lib64/libucs.so.0(+0x17970) [0x2abcf957f970]
    1  /usr/lib64/libucs.so.0(+0x17b22) [0x2abcf957fb22]
    2  /lib64/libc.so.6(+0x80419) [0x2abc388c1419]
    3  /lib64/libc.so.6(+0x820a5) [0x2abc388c30a5]
    4  /lib64/libc.so.6(+0x83ce2) [0x2abc388c4ce2]
    5  /lib64/libc.so.6(realloc+0x1d2) [0x2abc388c6d82]
    6  /home/jk869269/install/c_extrae/3.8.3/lib/libptmpitrace-3.8.3.so(realloc+0x3d) [0x2abc3507abbd]
    7  /home/jk869269/install/c_extrae/3.8.3/lib/libptmpitrace-3.8.3.so(Extrae_reallocate_thread_info+0x1e) [0x2abc35073f0e]
    8  /home/jk869269/install/c_extrae/3.8.3/lib/libptmpitrace-3.8.3.so(Backend_ChangeNumberOfThreads+0x3e3) [0x2abc3506be93]
    9  /home/jk869269/install/c_extrae/3.8.3/lib/libptmpitrace-3.8.3.so(pthread_create+0x103) [0x2abc350865f3]
   10  reproducer.exe() [0x4024d5]
   11  /home/jk869269/install/c_extrae/3.8.3/lib/libptmpitrace-3.8.3.so(+0x974be) [0x2abc350864be]
   12  /lib64/libpthread.so.0(+0x7ea5) [0x2abc36036ea5]
   13  /lib64/libc.so.6(clone+0x6d) [0x2abc3893f8dd]
===================

Solution

With some debugging and analysis, I created a fix for the problem by using read write locks whereas possible to keep performance up but get rid of the data races. Not sure whether I found all places. At least the proposed changes solved the issues I had with the TBB code and the reproducer.

Further I also implemented some changes to use lock/unlock Extrae internal pthread locks directly. Otherwise these would also be instrumented which creates additional overhead.