Use after free in `read_proc`

msimberg commented 4 days ago

In pika we have some tests that link to libapex.so but don't actually use apex, that often trigger a segmentation fault or similar late after main has exited. The commonality in these tests is that main typically runs very quickly. This can be reproduced with a simple program which simply contains

int main() {}

and links to apex, and rerunning it many times. I did e.g. this testing with g++ -g -O0 -lapex main.cpp using GCC 10, but I think it should translate to any compiler.

The issue seems to be that when main exits quickly, apex also reaches cleanup quite quickly (https://github.com/UO-OACISS/apex/blob/6edfb929860bf8e46577353c908042e09a12b8d8/src/apex/apex_preload.cpp#L113), but the proc_data_reader thread spawned in https://github.com/UO-OACISS/apex/blob/6edfb929860bf8e46577353c908042e09a12b8d8/src/apex/proc_read.h#L66 may run after (or while) the proc_data_reader is destroyed (https://github.com/UO-OACISS/apex/blob/6edfb929860bf8e46577353c908042e09a12b8d8/src/apex/apex.cpp#L151).

If I understand the code correctly this is because the thread is detached here: https://github.com/UO-OACISS/apex/blob/6edfb929860bf8e46577353c908042e09a12b8d8/src/apex/proc_read.h#L67. Is there any particular reason for detaching, especially given that it would be joined here: https://github.com/UO-OACISS/apex/blob/6edfb929860bf8e46577353c908042e09a12b8d8/src/apex/proc_read.h#L75? At least from a quick test, commenting out the detach, avoids the issue, but I don't know if it has other effects...

Note: while the above test fails after running it enough times, I found it easy to reproduce the issue by adding a few sleeps:

diff --git a/src/apex/apex_preload.cpp b/src/apex/apex_preload.cpp
index fa3ea381..f01bb22a 100644
--- a/src/apex/apex_preload.cpp
+++ b/src/apex/apex_preload.cpp
@@ -112,6 +112,7 @@ int apex_preload_main(int argc, char** argv, char** envp) {
         }
         apex::cleanup();
     }
+    std::this_thread::sleep_for(std::chrono::seconds(1)); // don't quit right away to allow read_proc to use memory after it's been freed
     return ret;
 }

diff --git a/src/apex/proc_read.cpp b/src/apex/proc_read.cpp
index c23de038..c2eb6a3a 100644
--- a/src/apex/proc_read.cpp
+++ b/src/apex/proc_read.cpp
@@ -768,6 +768,9 @@ namespace apex {
     /* This is the main function for the reader thread. */
     //void* proc_data_reader::read_proc(void * _ptw) {
     void* proc_data_reader::read_proc() {
+           std::this_thread::sleep_for(std::chrono::seconds(0.5)); // sleep long enough that apex has shut down, but not long enough that the whole application has exited already
         in_apex prevent_deadlocks;
         // when tracking memory allocations, ignore these
         in_apex prevent_nonsense;

Running this through valgrind, it reports with near certainty:

==105902== Thread 2:
==105902== Invalid read of size 1
==105902==    at 0x4B5CED7: load (atomic_base.h:426)
==105902==    by 0x4B5CED7: std::atomic<bool>::operator bool() const (atomic:87)
==105902==    by 0x4C386BD: apex::proc_data_reader::read_proc() (proc_read.cpp:795)
==105902==    by 0x4B7318D: void* std::__invoke_impl<void*, void* (apex::proc_data_reader::*)(), apex::proc_data_reader*>(std::__invoke_memfun_deref, voi
d* (apex::proc_data_reader::*&&)(), apex::proc_data_reader*&&) (invoke.h:73)
==105902==    by 0x4B730D1: std::__invoke_result<void* (apex::proc_data_reader::*)(), apex::proc_data_reader*>::type std::__invoke<void* (apex::proc_data
_reader::*)(), apex::proc_data_reader*>(void* (apex::proc_data_reader::*&&)(), apex::proc_data_reader*&&) (invoke.h:95)
==105902==    by 0x4B73042: void* std::thread::_Invoker<std::tuple<void* (apex::proc_data_reader::*)(), apex::proc_data_reader*> >::_M_invoke<0ul, 1ul>(s
td::_Index_tuple<0ul, 1ul>) (thread:264)
==105902==    by 0x4B72FA1: std::thread::_Invoker<std::tuple<void* (apex::proc_data_reader::*)(), apex::proc_data_reader*> >::operator()() (thread:271)
==105902==    by 0x4B72F05: std::thread::_State_impl<std::thread::_Invoker<std::tuple<void* (apex::proc_data_reader::*)(), apex::proc_data_reader*> > >::
_M_run() (thread:215)
==105902==    by 0x4E2249F: execute_native_thread_routine (thread.cc:80)
==105902==    by 0x50C1AC2: start_thread (pthread_create.c:442)
==105902==    by 0x5152A03: clone (clone.S:100)
==105902==  Address 0x52a2bf0 is 0 bytes inside a block of size 104 free'd
==105902==    at 0x484DB6F: operator delete(void*, unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==105902==    by 0x4B550A2: apex::apex::~apex() (apex.cpp:151)
==105902==    by 0x4B5AA9B: apex::cleanup() (apex.cpp:1873)
==105902==    by 0x4B51681: apex_preload_main (apex_preload.cpp:113)
==105902==    by 0x5056D8F: (below main) (libc_start_call_main.h:58)
==105902==  Block was alloc'd at
==105902==    at 0x484B013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==105902==    by 0x4B55D12: apex::init(char const*, unsigned long, unsigned long) (apex.cpp:531)
==105902==    by 0x4B51555: apex_preload_main (apex_preload.cpp:103)
==105902==    by 0x5056D8F: (below main) (libc_start_call_main.h:58)
==105902==

This was tested on apex develop (6edfb929860bf8e46577353c908042e09a12b8d8).

I see a few other std::thread::detaches in the codebase that may have the same issue, but I did not investigate those.

khuck commented 1 day ago

@msimberg thanks for the bug report... this problem has been there for a long time, and I am not quite sure how best to resolve it. It has not been a priority since it only happens in very short programs, as you mentioned. The reason I am detaching the thread is that in this very same case, the process sometimes hangs forever trying to join that thread. Detaching the thread and accepting the occasional, though rare, crash for very short programs was a compromise. There is a race condition in these very short programs in which the main thread is exiting before the worker thread even has a chance to start. If you have a good suggestion to fix it, that would be great... it's possible that I could set up a signal between the two threads, so the main thread won't continue until the worker thread indicates that it is ready to go...?

msimberg commented 22 hours ago

Thanks @khuck for the response! I see... Do you have some idea of where the thread is when it hangs? I tried to comment out the detach, but was not able to reproduce a hang (that doesn't mean it can't happen... I might just not have the right environment/configuration options etc.).

it's possible that I could set up a signal between the two threads, so the main thread won't continue until the worker thread indicates that it is ready to go

Naively I'd say the only place where it's be safe for the main thread to continue is after std::thread::join returns, but are you saying there might be an earlier point in read_proc where it's safe to destroy the this/proc_data_reader? The reading and done member variables are at least accessed in the while-loop in read_proc so I guess the earliest it could signal the main thread to continue is after the while-loop (at least without refactoring).

UO-OACISS / apex

Use after free in `read_proc` #182