Closed NielsH closed 4 months ago
This is because Systemd does a SIGKILL of the supervisor process after 90 seconds.
Sending SIGKILL
to supervisord
is undesirable because supervisord
will not be able to finish writing the log files, will not be able to clean up any temporarily files it has created, and any child processes supervisord
spawned will be orphaned.
Because of this, we are seeing that Supervisor-managed processes that do not respond to a SIGTERM and that have a stopwaitsecs exceeding the systemd value TimeoutStopSec (by default 90 seconds) will remain lingering indefinitely. I.e. the Laravel recommended configuration for their workers has a stopwaitsecs value of 3600.
When supervisord
receives a request to exit (by signal or supervisorctl
command), it will send the stopsignal
to each of its child processes. It will wait for all of its child processes to exit, then it will exit. During this time, supervisord
will log messages like waiting for <processname> to stop
. If stopwaitsecs
is set to 3600 seconds for a process (one hour), then supervisord
will wait up to one hour for the process to exit on its own. After stopwaitsecs
has elapsed, supervisord
will terminate the process with SIGKILL
and then it will finally exit.
Consider configuring the system such that supervisord
has the opportunity to exit cleanly: increase TimeoutStopSec
to be larger than the largest stopwaitsecs
or decrease all stopwaitsecs
to be less than TimeoutStopSec
.
We are considering changing the Systemd KillMode to the default of control-group, which does resolve the issue. However because the default is process, we would like to ask the Supervisor developers for the rationale of having this value on process, despite it not being recommended by Systemd.
The Supervisor project only publishes Python packages to PyPI and these do not contain integrations with any operating system (init scripts, unit files, etc). The integration with Systemd you are using was created by others who are not part of the Supervisor project itself.
Hello,
The default
KillMode
of Supervisor within its Debian package isKillMode=process
. The man page of Systemd says:Because of this, we are seeing that Supervisor-managed processes that do not respond to a SIGTERM and that have a
stopwaitsecs
exceeding the systemd valueTimeoutStopSec
(by default 90 seconds) will remain lingering indefinitely. I.e. the Laravel recommended configuration for their workers has astopwaitsecs
value of3600
.This is because Systemd does a SIGKILL of the supervisor process after 90 seconds. Due to
KillMode=process
, its forks remain running and because Supervisor no longer runs it cannot kill the running processes anymore after thestopwaitsecs
timeout is exceeded.This is also logged by systemd:
We are considering changing the Systemd KillMode to the default of
control-group
, which does resolve the issue. However because the default isprocess
, we would like to ask the Supervisor developers for the rationale of having this value onprocess
, despite it not being recommended by Systemd.Perhaps there is a reason we did not think of in which case we'd like to know before changing it on our side.
I am aware that this post may be better suited on a debian-specific mailing list, however since it did not seem very active I choose to post my question here hoping that someone is able to give any insights.
Thank you!