Implement tracing processes via environment variable

chriskuehl commented 3 years ago

This implements a new method of tracing processes which is more resilient. It works by looking for an environment variable that pgctl injects across all of the user's processes on the system. This is used when calling pgctl stop or pgctl stop --force to make sure all processes for a service are truly killed.

The main scenario this protects against is processes which spawn children without the pgctl lock file descriptor (which is a very common thing to do, since close_fds-type behavior is the default in many languages, including Python nowadays). It is still not 100% reliable at tracking since the environment variable can also be cleared, but that should be quite rare. This is IMO just about the best we can do without true cgroup support.

At ~15ms on my box to scan /proc, it isn't slow but it also isn't fast. From using it interactively it feels fine but we'll need to be careful not to insert this check into places which are called frequently.

I added a regression test in the form of the subprocess-with-closed-fds spec test which fails on master (leaks sleep infinity processes and doesn't detect it).

chriskuehl commented 3 years ago

There is 1 failing test which looks unrelated but I'm looking into it.

chriskuehl commented 3 years ago

I think the failing test is related because the process tracing causes the supervise process to get killed when pgctl-poll-ready calls pgctl restart or something like that, need to think about this kind of scenario more. We probably don't want it to ever kill s6 processes.

Yelp / pgctl

Implement tracing processes via environment variable #217