Supervisor / supervisor

Supervisor process control system for Unix (supervisord)
http://supervisord.org
Other
8.53k stars 1.25k forks source link

gunicorn worker sometimes got stuck when manage by supervisor #1629

Open WilliamChen-luckbob opened 8 months ago

WilliamChen-luckbob commented 8 months ago

I have a flask app which I want to deploy by gunicorn and manage it by supervisor.

while doing restart in supervisorctl, sometimes it will be workers stuck and dead and cannot shutdown by supervisor when stop or restart.

You can see as below: There should be 1master and 5 workers. I keep trying to restart app and check the log situation and the number of processes. Sometimes the worker will stuck and not starting at all, but it exists, and it's parent pid is the master. The master shutting down doesn't close this stucked worker process, its parent process becomes 1 and stays stuck there forever.

Showing as below, during one startup, process 25896 started successfully with its parent process being 25879 but with no initializing log. However, when Supervisor restarts, a normal service starts successfully, and process 25896's parent process becomes 1.

This issue is not reproducible consistently.

Once a stuck worker appears, it affects the operation of the master. For example, when a request enters the distribution phase if I don't kill -9 those dead pids , actually, there will be 6+ wokers(5 normal and 1+ dead ), the master loadbalancer will sends data to dead process and never receives a response, which will lead to a timeout request.

This issue never occurs when manually executing gunicorn myapp.wsgi:app -c myapp_gunicorn_conf.py. I have tried extensively to verify if it's an issue with my code and found that when starting gunicorn directly from the command line (regardless of whether using nohup for background startup), the program always starts and stops correctly.

image image

supervisor version 4.2.5 python 3.11.7 gunicorn==20.1.0 gevent==22.10.2

I don't know why I run into this situation...Can anyone tell me how to inspect the detailed log to see what happend? The dead pid will show nothing in log, just stuck and do nothing.