Open ItsShadowCone opened 2 years ago
Great find! This should be possible to fix. Looking into it...
I got some time to look into this. Since I'm going on a long vacation soon, I'll summarize the problem here for the benefit of my future self (or anyone else who wants to fix this).
First, man 2 ptrace
has this to say about the situation:
execve(2) under ptrace
When one thread in a multithreaded process calls execve(2), the kernel destroys all other threads in the process, and resets the thread ID of the execing thread to the thread group ID (process ID). (Or, to put things another way, when a multithreaded process does an execve(2), at completion of the call, it appears as though the execve(2) occurred in the thread group leader, regardless of which thread did the execve(2).) This resetting of the thread ID looks very confusing to tracers:
All other threads stop in
PTRACE_EVENT_EXIT
stop, if thePTRACE_O_TRACEEXIT
option was turned on. Then all other threads except the thread group leader report death as if they exited via_exit(2)
with exit code 0.The execing tracee changes its thread ID while it is in the
execve(2)
. (Remember, under ptrace, the "pid" returned from waitpid(2), or fed into ptrace calls, is the tracee's thread ID.) That is, the tracee's thread ID is reset to be the same as its process ID, which is the same as the thread group leader's thread ID.Then a
PTRACE_EVENT_EXEC
stop happens, if thePTRACE_O_TRACEEXEC
option was turned on.If the thread group leader has reported its
PTRACE_EVENT_EXIT
stop by this time, it appears to the tracer that the dead thread leader "reappears from nowhere". (Note: the thread group leader does not report death viaWIFEXITED(status)
until there is at least one other live thread. This eliminates the possibility that the tracer will see it dying and then reappearing.) If the thread group leader was still alive, for the tracer this may look as if thread group leader returns from a different system call than it entered, or even "returned from a system call even though it was not in any system call". If the thread group leader was not traced (or was traced by a different tracer), then during execve(2) it will appear as if it has become a tracee of the tracer of the execing tracee.
The core problem is that the non-main thread is getting the PTRACE_EVENT_EXIT
stop and when we resume, Reverie is expecting the "real" exit, but we're getting PTRACE_EVENT_EXEC
instead.
Now, complications arise because we handle the PTRACE_EVENT_EXIT
event in a special way. This event can happen at any time, even while in another ptrace stop, so we view it as an asynchronous event. That is, we tokio::select!()
over two futures: the penultimate "exit" event and the entire run loop of a tracee thread. We do it this way because the Reverie Tool
might be awaiting a mutex and we want that to be canceled and dropped automagically. This ensures that the tool can gracefully handle sudden exit events without corrupting its own state.
The post-exit exec event should really be handled inside of the run loop, not outside of it because there's a chance that it's "recoverable". However, this code was very carefully crafted originally, so this could be a little tricky.
I hope you don't mind me breaking your program :)
I found that if the tracee calls execve within a thread reverie-ptrace panics.
According to the
clone
man page (man 2 clone
)The panic happens due to this
From my limited understanding the proper way to handle this situation is discarding all threads of this process and resuming the main thread of the process as the only (new) process. I'm not quite sure if that's even possible in your current architecture.
Also, I realize this is an edge case, you might happily ignore it after all.
For reference, my tracee
Compile with
-lpthread