Closed chriskuehl closed 3 years ago
There is 1 failing test which looks unrelated but I'm looking into it.
I think the failing test is related because the process tracing causes the supervise
process to get killed when pgctl-poll-ready
calls pgctl restart
or something like that, need to think about this kind of scenario more. We probably don't want it to ever kill s6 processes.
This implements a new method of tracing processes which is more resilient. It works by looking for an environment variable that pgctl injects across all of the user's processes on the system. This is used when calling
pgctl stop
orpgctl stop --force
to make sure all processes for a service are truly killed.The main scenario this protects against is processes which spawn children without the pgctl lock file descriptor (which is a very common thing to do, since
close_fds
-type behavior is the default in many languages, including Python nowadays). It is still not 100% reliable at tracking since the environment variable can also be cleared, but that should be quite rare. This is IMO just about the best we can do without true cgroup support.At ~15ms on my box to scan
/proc
, it isn't slow but it also isn't fast. From using it interactively it feels fine but we'll need to be careful not to insert this check into places which are called frequently.I added a regression test in the form of the
subprocess-with-closed-fds
spec test which fails on master (leakssleep infinity
processes and doesn't detect it).