Closed bigunyak closed 4 years ago
--timeout=5
This is the most common cause of this issue.
I hope my solution could help you. I met this critical worker timeout problem a few days ago and tried a few solutions. It now works well.
Here are my understanding and solutions:
It fails to boot the workers because it needs more time to load the package, such as tensorflow backend, to start the service. So when you are experiencing slow app boot time, try to enable preload option in gunicorn (See https://devcenter.heroku.com/articles/python-gunicorn#advanced-configuration).
gunicorn hello:app --preload
The default timeout is 30s. If your application really need much time to finish an api, increase the timeout.
gunicorn hello:app --timeout 10
However, from my perspective, it doesn't make sense if an api need more than 1 minutes to finish. If so, try to make some progress in your code.
I faced the same issue today. In my case the api was taking about a minute to calculate data and return to the client, which resulted in CRITICAL WORKER TIMEOUT errors. I solved it by increasing the timeout flag for gunicorn to more than a minute - it worked, did not see the issue come back. Hope this helps. I am using uvicorn.workers.UvicornWorker.
I fixed this by adding extra workers to gnuicorn:
web: gunicorn --workers=3 BlocAPI:app --log-file -
No idea why.
Maybe you had a deadlock ? Does your app make requests to itself ?
On Sun, 5 Jan 2020, 10:52 alpinechicken, notifications@github.com wrote:
I fixed this by adding extra workers to gnuicorn:
web: gunicorn --workers=3 BlocAPI:app --log-file -
No idea why.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/benoitc/gunicorn/issues/1801?email_source=notifications&email_token=AAAEQJVQRCW3C63EZJWIN5DQ4G3WTA5CNFSM4FDLD5PKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIDTZIA#issuecomment-570899616, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEQJXZM4NLK56DZMFSZALQ4G3WTANCNFSM4FDLD5PA .
Yep one route calls another - is that bad?
It means that you need at least two workers otherwise your server will deadlock. The request will wait until the server responds to the second request (which would be queued).
You get one concurrent request per worker.
On Mon, 6 Jan 2020, 02:45 alpinechicken, notifications@github.com wrote:
Yep one route calls another - is that bad?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/benoitc/gunicorn/issues/1801?email_source=notifications&email_token=AAAEQJSFEFBBI6AMZJCM4C3Q4KLOJA5CNFSM4FDLD5PKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIEIEXI#issuecomment-570983005, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEQJXTCPOFIZJU5PUPOODQ4KLOJANCNFSM4FDLD5PA .
Ah that makes sense. Thanks!
On Tue, Jan 7, 2020 at 6:23 AM bobf notifications@github.com wrote:
It means that you need at least two workers otherwise your server will deadlock. The request will wait until the server responds to the second request (which would be queued).
You get one concurrent request per worker.
On Mon, 6 Jan 2020, 02:45 alpinechicken, notifications@github.com wrote:
Yep one route calls another - is that bad?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/benoitc/gunicorn/issues/1801?email_source=notifications&email_token=AAAEQJSFEFBBI6AMZJCM4C3Q4KLOJA5CNFSM4FDLD5PKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIEIEXI#issuecomment-570983005 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAAEQJXTCPOFIZJU5PUPOODQ4KLOJANCNFSM4FDLD5PA
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/benoitc/gunicorn/issues/1801?email_source=notifications&email_token=AAH2WRPVPVO2EJ53BKQW5B3Q4OHLRA5CNFSM4FDLD5PKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIGVJ7Q#issuecomment-571299070, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH2WRM2LLIB4O6OHCU5UG3Q4OHLRANCNFSM4FDLD5PA .
worker_class', 'sync')
I am able to resolve this issue by matching the number of workers and number of threads.
I had set
workers = (2 * cpu_count) + 1
and did not set threads.Once I changed
threads = workers
, everything started working fine. Just in case, if this helps someone.This is how it looks now
def run(host='0.0.0.0', port=8080, workers=1 + (multiprocessing.cpu_count() * 2)): """Run the app with Gunicorn.""" if app.debug: app.run(host, int(port), use_reloader=False) else: gunicorn = WSGIApplication() gunicorn.load_wsgiapp = lambda: app gunicorn.cfg.set('bind', '%s:%s' % (host, port)) gunicorn.cfg.set('workers', workers) gunicorn.cfg.set('threads', workers) gunicorn.cfg.set('pidfile', None) gunicorn.cfg.set('worker_class', 'sync') gunicorn.cfg.set('keepalive', 10) gunicorn.cfg.set('accesslog', '-') gunicorn.cfg.set('errorlog', '-') gunicorn.cfg.set('reload', True) gunicorn.chdir() gunicorn.run()
As per gunicorn doc, it changes the worker class from sync to gthread if more than one threads are mentioned. PS:- If you try to use the sync worker type and set the threads setting to more than 1, the gthread worker type will be used instead.
My case:
Environment: Ubuntu18.04+ gunicorn+ nginx +flask
pip install gunicorn[gevent] in my virtual environment
Change gunicorn -b localhost:8000 -w 4 web:app
to gunicorn -b localhost:8000 -k gevent web:app
It works.
Thank you to everyone here who has done so much to help one another resolve their issues. Please continue to post to this issue if it seems appropriate.
However, I am closing this issue because I don't think there is any bug in Gunicorn here and I don't think there is any action to take, although I will happily help review PRs that try to add documentation for this somehow or improve log messages.
Please do not misunderstand my intention. If you suspect a bug in Gunicorn and want to continue discussing, please do so. Preferably, open a new ticket with an example application that reproduces your issue. However, at this point, there are too many different problems, resolutions, and conversations in this issue for it to be very legible.
If you run Gunicorn without a buffering reverse proxy in front of it you will get timeouts with the default, sync worker for any number of reasons. Common ones are:
You can switch to asynchronous or threaded worker types, or you can put Gunicorn behind a buffering reverse proxy. If you know that your timeouts are due to your own code making slow calls to external APIs or doing significant work that you expect, you may increase the --timeout
option.
It means that you need at least two workers otherwise your server will deadlock. The request will wait until the server responds to the second request (which would be queued). You get one concurrent request per worker. … On Mon, 6 Jan 2020, 02:45 alpinechicken, @.***> wrote: Yep one route calls another - is that bad?
Is this the case when calling the 'redirect' function as the return value for a route?
Is this the case when calling the 'redirect' function as the return value for a route?
No. A flask redirect responds with an HTTP redirect and the worker is then free to accept a new request. The client makes another request when it sees this response and whenever a worker is ready in will receive this request.
I fixed this by adding extra workers to gnuicorn:
web: gunicorn --workers=3 BlocAPI:app --log-file -
No idea why.
Is this related to @anilpai comment earlier where he set workers=1 + (multiprocessing.cpu_count() * 2)
.. ?
I had a similar issue to this. Turns out I had an error in my entrypoint to the application. From debugging it seemed that I was essentially launching a flask app from gunicorn, who's workers subsequently enter an infinite connection loop which times out every 30s.
I'm sure that this doesn't affect all users above, but may well affect some.
In my module/wsgi.py
file which I'm running with gunicorn module.wsgi
I had -
application = my_create_app_function()
application.run(host="0.0.0.0")
Whereas I should've had -
application = my_create_app_function()
if __name__ == "__main__":
application.run(host="0.0.0.0")
Essentially, you don't want to call application.run()
when using gunicorn. The __name__
under gunicorn won't be "__main__"
, but it will in Flask, so you can still debug locally.
I couldn't find a reference to this in the gunicorn docs, but could imagine it being a common error case, so maybe some warning is necessary.
This is still occuring. Adding --preload
to the Gunicorn call fixed the issue for me.
Is this bug still not fixed? I am observing this exact behavior.
Gunicorn starts like this in systemd:
[Service]
PIDFile = /run/gunicorn.pid
WorkingDirectory = /home/pi/pyTest
ExecStart=/usr/local/bin/gunicorn app:app -b 0.0.0.0:80 --pid /run/gunicorn.pid
RuntimeDirectory=/home/pi/pyTest
Restart=always
KillSignal=SIGQUIT
Type=notify
StandardError=syslog
NotifyAccess=all
User=root
Group=root
ExecReload = /bin/kill -s HUP $MAINPID
ExecStop = /bin/kill -s TERM $MAINPID
ExecStopPost = /bin/rm -rf /run/gunicorn
PrivateTmp = true
Worker process constantly times out and restarts:
Jul 10 15:19:20 raspberryVM gunicorn[10941]: [2020-07-10 15:19:20 -0700] [10941] [CRITICAL] WORKER TIMEOUT (pid:10944)
Jul 10 15:19:20 raspberryVM gunicorn[10941]: [2020-07-10 15:19:20 -0700] [10944] [INFO] Worker exiting (pid: 10944)
Jul 10 15:20:15 raspberryVM gunicorn[10941]: [2020-07-10 15:20:15 -0700] [10985] [INFO] Booting worker with pid: 10985
app.py is a trival Flask app.
Is this issue closed as Don't Fix?
I was also having the same issue
But after Debugging Im able to find that while gunicorn starts Django App one of the dependency was taking longer than the expected time , ( In my case external DB connection ) which make the gunicron
worker to timeout
When I resolved the connection issue , timeout issue also resolved ...
This would not my case. I tested with “Hello, World” type of app, with no dependencies. So I am still puzzled by this, but it seems it’s not possible to have Gunicorn with long running thread. Worker process restarts and therefore kill the long running thread.
@leonbrag This is likely NOT a gunicorn bug. See my commend above in the thread. It's a side-effect of browsers sending empty "predicted" TCP connections, and running gunicorn with only a few sync workers without protection from empty TCP connections.
Is there a reference architecture/design that shows a proper way to set up Gunicorn flask app with long (permanent) worker thread ?
If this is not a bug, then it’s seems an artifact or a limitation of the Gunicorn architecture/design.
Why would not sync worker run forever and accept clients connections. Such worker would close socket as needed, yet continue to run without exIting (and therefor worker thread continue to run).
@leonbrag You should be more specific about what problem you are trying to solve.
The problem discussed in this thread happens in dev environment and the easiest solution is either to add more sync workers or use threaded workers.
If you want to avoid this issue in production setup, you can use gevent workers, or you can put an nginx infront of gunicorn. Some PaaS already put an nginx in front of your docker container, so you don't have to worry about it. Again the solution depends on the context and the details.
This is a good reading. https://www.brianstorti.com/the-role-of-a-reverse-proxy-to-protect-your-application-against-slow-clients/
you can check the design page from the documentation. Async workers is one way to run long tasks.
On Sat 8 Aug 2020 at 18:00 leonbrag notifications@github.com wrote:
Is there a reference architecture/design that shows a proper way to set up Gunicorn flask app with long (permanent) worker thread ?
If this is not a bug, then it’s seems an artifact or a limitation of the Gunicorn architecture/design.
Why would not sync worker run forever and accept clients connections. Such worker would close socket as needed, yet continue to run without exIting (and therefor worker thread continue to run).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/benoitc/gunicorn/issues/1801#issuecomment-670944797, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAADRIWRQGIP3R5PMVJ5ENTR7VZA3ANCNFSM4FDLD5PA .
-- Sent from my Mobile
web: gunicorn --workers=3 app:app --timeout 200 --log-file -
I fixed my problem by incresing the --timeout
See also #1388 for Docker related tmpfs issues.
Oh, thanks a lot Randall, I forgot to add --worker-tmp-dir /dev/shm
to gunicorn arguments when I was running gunicorn in Docker.
BTW will 64 Mb be enough for gunicorn cache?
gunicorn app:app --timeout 1000 Or gunicorn app:app --preload
Worked for me... I prefer timeout one.
Strange, I added --worker-tmp-dir /dev/shm
but still receiving:
[2020-11-27 21:01:42 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:17)
To make sure /dev/shm
is ramfs I benchmarked it:
The params are next:
command: /bin/bash -c "cd /code/ && pipenv run gunicorn --worker-tmp-dir /dev/shm conf.wsgi:application --bind 0.0.0.0:8022 --workers 5 --worker-connections=1000"
PS: I am using PyPy
@attajutt timeout is nice but you are risking that gunicorn master process will detect hangup in your worker process only after 1000 seconds, and you will miss a lot of requests. Also it will be hard to detect it if only one of several workers will hangup. I would not do 1000 at least.
@ivictbor thanks for lmk. 1000 is for reference. Nevertheless, I got the app rolling once Its loaded It is running perfectly fine.
I got this error problem too and after several times, I found that the problem is probably caused :
If you deploy your app in cloud like GAE, that will not surface anything hint error. you can try to surface the error using this case solution : https://stackoverflow.com/questions/38012797/google-app-engine-502-bad-gateway-with-nodejs
If raised 502 bad gateway; probably will have 2 possibilities:
complete sulotion explained in here : https://www.datadoghq.com/blog/nginx-502-bad-gateway-errors-gunicorn/
hope that can fix anyone got error in [CRITICAL] WORKER TIMEOUT
Adding another possibility for those who find this thread...
This can also be caused by having docker imposed resource constrains that are too low for you web application. For example I had the following constraints:
services:
web_app:
image: blah-blah
deploy:
resources:
limits:
cpus: "0.25"
memory: 128M
and these were evidently too low for gunicorn
so I constantly got the [CRITICAL] WORKER TIMEOUT
error until I removed the constraints.
For gunicorn this resources are perfectly fine. But you indeed need to plane for the number of workers and the resources needed for your application. 128M and 0.25cpu seems really low for a web application written in Python.... generally speaking you need at least 1 core /vcpu and 512MB of RAM as a bare minimum.
On Fri 26 Mar 2021 at 02:14, Colton Hicks @.***> wrote:
Adding another possibility for those who find this thread...
This can also be caused by having docker imposed resource constrains that are too low for you web application. For example I had the following constraints:
services: web_app: image: blah-blah deploy: resources: limits: cpus: "0.25" memory: 128M
and these were evidently too low for gunicorn so I constantly got the [CRITICAL] WORKER TIMEOUT error until I removed the constraints.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/benoitc/gunicorn/issues/1801#issuecomment-807855647, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAADRITPZB7BMA6QW7LFNVLTFPNV3ANCNFSM4FDLD5PA .
-- Sent from my Mobile
--timeout=1000 worked form me. Issue was a low-cpu resourced GCP machine. It worked fine on my local machine with the default timeout.
gunicorn app:app --timeout 1000
You're great. It was for me the solution. Thanks very much.
gunicorn app:app --timeout 3000 Worked for me ✌️
It seems there have been already several reports related to
[CRITICAL] WORKER TIMEOUT
error but it just keeps popping up. Here is my issue.I'm running this Flask hello world application.
The gunicorn command is this one:
And this is the console output:
Can you please clearly explain why do I get the error and if it's expected in this example? How do I fix it or if it's an expected behavior why critical error then?