buildkite / agent

The Buildkite Agent is an open-source toolkit written in Go for securely running build jobs on any device or network
https://buildkite.com/
MIT License
810 stars 296 forks source link

Change default shell signal from SIGTERM to SIGINT #1392

Open chloeruka opened 3 years ago

chloeruka commented 3 years ago

For historical reasons when we send interrupts to process groups, we default to using SIGTERM:

https://github.com/buildkite/agent/blob/b9bc5ecc8e67fd6734d8930fb5ebb848902d99f2/process/signal.go#L35-L38

When we send SIGTERM to a process group, bash exits without waiting on its child processes to exit safely. In many cases, the child processes end from other effects, such as SIGPIPE caused by trying to read or write over an orphaned pipeline. This unreliability can lead to lost output or incorrect shutdowns of subprocesses. A script author could potentially work around this using a signal handler. A better approach would be to send a SIGINT signal instead, which will be followed by a SIGKILL if the group takes too long to respond. SIGINT only gets propagated to foreground child processes, so special case handling will still be required for users who are backgrounding processes.

Most processes treat SIGINT the same as SIGTERM, but we can't be sure if anyone is relying on this behaviour. Therefore, this should be considered a breaking change. A user can experiment with this change ahead of time by using the configurable cancel signals introduced by #1041 and #1390.

I've tried to carefully research this and get the above details correct, but please feel free to contribute any corrections or details I might have missed.

pda commented 3 years ago

Interesting — have you come across any good links explaining the differences between SIGINT and SIGTERM?

My understanding over the years has stopped at “SIGINT is usually ctrl-c, SIGTERM is usually kill or program-initiated termination, SIGKILL is the kernel just no longer scheduling the process”. I never knew any difference with SIGINT vs SIGTERM around how it's propagated in progress groups between process leader / others processes and/r foreground processes. My understanding of process groups overall is pretty sketchy.

Got any links to things that explain SIGTERM vs SIGINT behaviour?

chloeruka commented 3 years ago

My understanding of process groups overall is pretty sketchy.

Yeah, same! Learning this as I go. 😅

Interesting — have you come across any good links explaining the differences between SIGINT and SIGTERM?

This is cobbled together across various bits and pieces. I'll pop a few of the references I used down: