benoitc / gunicorn

gunicorn 'Green Unicorn' is a WSGI HTTP Server for UNIX, fast clients and sleepy applications.
http://www.gunicorn.org
Other
9.82k stars 1.75k forks source link

Worker timeout after shutdown complete #3194

Open InterferencePattern opened 6 months ago

InterferencePattern commented 6 months ago

I have gunicorn managing a Flask app run using the following docker command

CMD gunicorn flask_app:app -b 0.0.0.0:8080 --max-requests 1000 --max-requests-jitter 50 -w 2 -k uvicorn.workers.UvicornWorker --timeout 100 --graceful-timeout 100

Looking at the STDOUT I see that there's an issue when restarting the worker with max requests: It times out after it appears to have successfully killed the worker. Does anyone have an idea what could be causing the timeout? I see that kafka is timing out, but if that were the issue I'd expect not to see "Application shutdown complete." or "Finished server process"

[2024-04-25 14:28:44 +0000] [29227] [INFO] Shutting down
[2024-04-25 14:28:44 +0000] [29227] [INFO] Waiting for application shutdown.
{"asctime": "2024-04-25 14:28:46,080", "name": "...subprocesses.update_workload", "levelname": "INFO", "levelno": 20, "pathname": "/app/.../update_workload.py", "lineno": 23, "funcName": "update_workload", "threadName": "MainThread", "processName": "sub-process: update-workload-worker for PID: 29227", "message": "Shutdown triggered, exited update_workload gracefully."}
%5|1714055329.954|REQTMOUT|rdkafka#consumer-3| [thrd:GroupCoordinator]: GroupCoordinator/3: Timed out LeaveGroupRequest in flight (after 5004ms, timeout #0): possibly held back by preceeding blocking JoinGroupRequest with timeout in 99698ms
%4|1714055329.954|REQTMOUT|rdkafka#consumer-3| [thrd:GroupCoordinator]: GroupCoordinator/3: Timed out 1 in-flight, 0 retry-queued, 0 out-queue, 0 partially-sent requests
%3|1714055329.955|FAIL|rdkafka#consumer-3| [thrd:GroupCoordinator]: GroupCoordinator: 1 request(s) timed out: disconnect (after 2219257ms in state UP)
[2024-04-25 14:28:50 +0000] [29227] [INFO] Worker exiting (pid: 29227)
[2024-04-25 14:28:50 +0000] [29227] [INFO] Application shutdown complete.
[2024-04-25 14:28:50 +0000] [29227] [INFO] Finished server process [29227]
[2024-04-25 14:29:50 +0000] [69] [CRITICAL] WORKER TIMEOUT (pid:29227)
[2024-04-25 14:29:50 +0000] [69] [ERROR] Worker (pid:29227) was sent code 134!
[2024-04-25 14:29:50 +0000] [31401] [INFO] Booting worker with pid: 31401
InterferencePattern commented 4 months ago

It seems that UvicornWorker doesn't run notify() on every request, so gunicorn is interpreting the worker as unresponsive.

InterferencePattern commented 2 weeks ago

I can verify that I'm still seeing this with my quart app.

The @app.after_serving function runs successfully, the worker produces no more logs, and yet I end up with SIGABRT messages very frequently (though not every time.)