Open huwcbjones opened 2 years ago
Related SO thread: https://stackoverflow.com/questions/63945102/gunicorn-with-systemd-watchdog
How often would be too often too send WATCHDOG=1
- is every second too much? How expensive is it - just a socket write? Or does the value of WATCHDOG_USEC
need to be considered (other than it being non-zero)?
Thanks for the SO link, I didn't manage to find that and it clears up a few preconceptions 👍🏻
We tend to notify WATCHDOG_USEC
/4, we also have an app that is loop based and notifies on completion of every loop.
At a glance, there would be a fair bit more work in taking e.g. WATCHDOG_USEC / 4
or WATCHDOG_USEC / 2
and only sending WATCHDOG=1
at that specific rate. Arbiter has a natural enough loop for this, but I think it is hard-coded to 1 second IIRC.
So in terms of measuring the effort and effectiveness of implementing watchdog notifications, it would be good to gather feedback on whether:
WatchdocSec
is OK, i.e. fixed notification 1-per-second from arbiter poll loop is going to be OK.WatchdocSec
minimum should be 2 or 4 seconds and that the actual value is ignored by Gunicorn watchdog logic other than activating the logic to send roughly once-per-second.Implementing WATCHDOG_USEC / n
would require some stopwatch logic which might not be worth the trouble. But I'm not confident on the cost or risk of ongoing socket open/write/close to the systemd watchdog fd.
There's a pure Python implementation of sd_notify here, fwiw: https://github.com/bb4242/sdnotify/blob/master/sdnotify/__init__.py
By default systemd
WatchdogSec
"Defaults to 0, which disables this feature." When a service is configured with a watchdog, the service should notify systemd that it is still alive by sendingWATCHDOG=1
(keepalive ping) to ensure systemd doesn't kill the process.We'd like to enable this to ensure that systemd detects if gunicorn gets stuck, however at the moment I've got systemd aborting the main PID everytime the watchdog timer fires which took me a bit too long to realise! 😅
By default, I'd probably shove the keepalive in the main process, however I haven't looked too much into how gunicorn hangs together yet. If you guys don't want to support the watchdog timer, it would be good to document that gunicorn is not compatible with it.