Signal arriving between pre-syscall event and syscall can cause problems for clients

derekbruening commented 1 year ago

Realized while discussing an asynchronous signal arriving after we emit the syscall marker for drmemtrace in the pre-syscall event handler but before the syscall starts, for PR #6096.

DR does ignore pending signals for what it considers "ignorable" system calls, which do not block:

            /* It is difficult to undo some pre-syscall handling, especially for
             * sigreturn's signal mask and clone syscalls.  We go ahead and run the
             * syscall before we deliver the signal for all non-ignorable syscalls.
             * These are nearly all non-blocking so this should not be an issue with
             * signal delay from blocking.  Sigreturn and clone will come back to
             * d_r_dispatch so there's no worry about unbounded delay.
             */

So for other system calls, DR will call client pre-syscall event handlers but then go run the app's signal handler and once that's done it will go back to the syscall: and repeat the syscall instruction (in a tail-duplicated block) and repeat the pre-syscall event handler calls.

You can imagine the client taking some action that assumes no further code will be run in that thread before the syscall. E.g., enabling PT tracing in drmemtrace, or keeping some state that is fragile or assumes recency or assumes a post-syscall event and would have a problem with an app handler running (and maybe making the same syscall).

Could we consider all client-filtered syscalls to also be "non-ignorable" and delay a signal? If the syscall is blocking this seems bad.

The best solution would be to pretend the signal arrived later: emulate the kernel's interruption and set up for post-syscall handling with EINTR (plus adjust PC for auto-restart) and deliver the signal. Afterward, make sure we run the post-syscall event handler with the EINTR and return EINTR to the app.

We could also check for pending signals right before the pre_syscall handler to shrink the problematic window.

derekbruening commented 1 year ago

Looking further into this, there are several complications:

Only system calls documented as returning EINTR will be expected by the app to do so and will have retry logic. We can't just fail a random syscall with EINTR.
Even system calls that usually block and can return EINTR will never return EINTR if they are configured as non-blocking: see https://stackoverflow.com/questions/14134440/eintr-and-non-blocking-calls. If applications assume this they will not have retry logic in place.

All that means that DR would have to know precisely whether a given system call instance is blocking (and not just "maybe-blocking" as in PR #6132 for #5843), which is not easy. We would have to track file descriptor state, or query state: which gets messy. Is there more than O_NONBLOCK and MSG_DONTWAIT? What about various Mach syscalls?

For syscalls that won't block, the solution is simple: treat just like DR is treating ignorable ones today.

Could we instead mask out all signals between pre-syscall handler and entering the cache?

But can always get signal between removal of mask and execution of app syscall: only a few syscalls have versions that make signal masks an atomic part of their operation.

Looks like record_pending_signal() checks get_at_syscall() to handle this case and it will go back to dispatch and deliver the signal whether the syscall was actually started or not. But: dispatch calls handle_post_system_call() based on get_at_syscall: so it would invoke post-syscall even if it never made it to the syscall? That means DR would continue on as though the syscall had executed when really it hadn't?!? Sure looks that way: asynch_target will be post-syscall from the exit stub. Has this ever happened?? Am I missing something?? TOFILE separately => filed #6199. To solve this skipped syscall: record_pending_signal() needs to look at where in the gencode the signal interrupted.

derekbruening commented 1 year ago

Another possibility is to have DR tell the client it aborted the syscall. In some cases a client could undo its pre-syscall action. This isn't a complete solution but could help. For drmemtrace: we would roll back the syscall marker and insert a special marker for raw2trace to not include the syscall instr at the end of the prior block.

derekbruening commented 1 year ago

For drmemtrace: could move our syscall marker into the post-syscall event but that has drawbacks as there can be large delays with signal handlers in the current thread or interleaved other threads; plus there are no post-syscall events for some syscalls.

Our conclusion is that we cannot practically solve this in a way that magically avoids clients facing complexity. While some steps like checking the pending signals flag just prior to the pre-syscall event are worth doing to shrink the frequency of this happening, it may not be worth further work and instead we should document the possibility of aborted syscall attempts to the client and provide a pre_syscall_aborted_event.

DynamoRIO / dynamorio

Signal arriving between pre-syscall event and syscall can cause problems for clients #6105

Could we instead mask out all signals between pre-syscall handler and entering the cache?