Closed sanchda closed 2 days ago
Benchmark execution time: 2024-11-25 20:51:20
Comparing candidate commit 80747285 in PR branch sanchda/fix_poll_zombies
with baseline commit bdbbd73b in branch main
.
Found 0 performance improvements and 0 performance regressions! Performance is the same for 51 metrics, 2 unstable metrics.
Omitted due to size.
Attention: Patch coverage is 0%
with 23 lines
in your changes missing coverage. Please review.
Project coverage is 70.47%. Comparing base (
bdbbd73
) to head (8074728
).
What does this PR do?
The original implementation accidentally had a mutable array with immutable objects, causing the interface to always throw errors. Since this part of the code is in the critical path for handling zombie processes, this condition had an adverse side-effect on customer infrastructure.
This code also used a
BorrowedFd
, which is supposed to track anOwnedFd
. This was problematic in some conditions, since the underlying implementation would useprctl()
to check file descriptor liveness and panic in some edge-cases. The code has been ported to libc, using exclusivelyRawFd
, in order to prevent this condition.Finally, this patch grants some additional time to the act of reaping a PID. When a receiver process exceeds its timeout budget, it's sent a
SIGKILL
. However, the old behavior was toSIGKILL
, the immediatelywaitpid( pid, ..., WNOHANG)
. On a saturated system (i.e., precisely the kind of system where a timeout might be necessary!), it may take some time for the receiver PID to respond to theSIGKILL
.In general, there's no way to provided a bounded guarantee for the duration of this reap operation, so an arbitrary number of scheduler slices is chosen as the maximum reaping wait duration.
Motivation
Fix zombies