This is a fun one. The way zygote handled signals (particularly those related to shutdown) was really, really janky. Some issues that have been fixed:
I'd (and I would guess evan'd) been mostly testing interactively. Pressing Ctrl-C sends SIGINT to all members of the current session. Which means that children were receiving the shutdown signal directly, instead of proxied through the master process, and that code wasn't being tested at all. I changed the children to ignore SIGINT and SIGTERM, and only listen to SIGQUIT (I chose SIGQUIT over the usual SIGUSR1 just to be contrary)
The handling of children exiting was not consistent, particularly when trying to shut down zygote entirely. Now zygotes are told to exit through the domain socket, and each zygote handles its own children exiting with a nice, sane, wait(2).
Killing workers required knowledge of their PIDs. I've made zygotes into process group leaders so that we can clean things up more easily with kill(-pid)
If a child ever ignored the termination signal, it would be left there forever and the process would silently spin in the ioloop waiting for SIGCHLDs to come in.
If a zygote ever died, then trying to quit the master would end up blocking on that zygote (although it would usually end up with the master dying with an exception because the domain socket wasn't connected, but that's not better!)
Testing performed:
started and stopped a test application many times, verified that there are no leaked workers or zygotes
modified my application to block SIGQUIT, verified that it still gets KILLed as requested
ran a combination of HUP and QUIT to verify that neither zygotes nor workers are lost
Testing not performed:
did not run unit test suite, because evan made it require python 2.7 and tornado 2.0, neither of which I have handy
This is a fun one. The way zygote handled signals (particularly those related to shutdown) was really, really janky. Some issues that have been fixed:
SIGINT
to all members of the current session. Which means that children were receiving the shutdown signal directly, instead of proxied through the master process, and that code wasn't being tested at all. I changed the children to ignoreSIGINT
andSIGTERM
, and only listen toSIGQUIT
(I choseSIGQUIT
over the usualSIGUSR1
just to be contrary)wait(2)
.kill(-pid)
SIGCHLD
s to come in.Testing performed:
Testing not performed: