Open TimMcCauley opened 6 years ago
@zephylac have you any experience with gunicorn settings? Sometimes requests are timing out on our live servers using the following settings:
workers = 2
worker_class = 'gevent'
worker_connections = 1000
timeout = 30
keepalive = 2
I now am trying the following settings instead which are recommended in the post above.
worker_class = 'gthread'
threads = 4
I don't have any experience with gunicorn but I can try to have a look into it and find some info.
I'm currently spamming my instance with request but I didn't experienced any timeout (for now).
I've looked into it a little bit.
In the article you mentionned they were also talking about --worker-tmp-dir
which might cause problems to workers.
I've already seen some info about threads
option. Opinions seemed to converged to threads = workers
.
It seems that the ‘(solution)[https://www.brianstorti.com/the-role-of-a-reverse-proxy-to-protect-your-application-against-slow-clients/]’ some found was to expose NGINX in front of gunicorn.
On my side I've tried to timeout my workers (without changing current gunicorn parameters). On both extreme load or rest, my workers don't seem to timeout.
Thanks for looking this up @zephylac - if you are running your batch requests, could you also run them against api.openrouteservice.org at the same time? I can send you a token allowing a higher quota - if you agree - which email could I send the token to?
I've sent you an email !
Under which architecture are you running your service ? Are you using docker ? Are you running on VM or dedicated ?
We are running this on a VM in our openstack environment with 32GB RAM and 8 cores. The postgis database is running on a different and smaller VM with unfortunately with very slow disks (which soon will be updated to SSDs). The containers running on this VM are
ubuntu@ors-microservices:~|⇒ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
68404976f9d6 openelevationservice_gunicorn_flask_2 "/oes_venv/bin/gun..." 8 weeks ago Up 2 days 0.0.0.0:5021->5000/tcp openelevationservice_gunicorn_flask_2_1
6959766a7ee9 openelevationservice_gunicorn_flask "/oes_venv/bin/gun..." 8 weeks ago Up 2 days 0.0.0.0:5020->5000/tcp openelevationservice_gunicorn_flask_1
ec736d4cd30c openpoiservice_gunicorn_flask_05122018_2 "/ops_venv/bin/gun..." 5 months ago Up 24 hours 0.0.0.0:5006->5000/tcp openpoiservice_gunicorn_flask_05122018_2_1
c62417a4f60e openpoiservice_gunicorn_flask_05122018 "/ops_venv/bin/gun..." 5 months ago Up 24 hours 0.0.0.0:5005->5000/tcp openpoiservice_gunicorn_flask_05122018_1
Does the workers are timing out even on idle ? Or just under load ?
I've looked on my logs, none of my workers have timed out during 1 week of intense load.
Some requests will simply timeout but I haven't found a pattern for this yet.
Maybe PostgreSQL12 & PostGIS 3 will fix a part of this issue by supporting correctly the parallelization.
Agreed. Did you test the live API with the token I sent you by any chance @zephylac ?
Yup I tried but it seems it has expired.
Ah shit, sorry - it's now extended forever ;-) and won't expire anymore (same token as in the email).
Hi @TimMcCauley, obviously it has been a while, but as I am facing the same issue you described (random timeouts with larger batches of POI requests using docker) I am wondering, if you have found a solution?
maybe this topic might help? https://pythonspeed.com/articles/gunicorn-in-docker/
Hi @lingster, this link was mentioned earlier by Tim. I was unable to solve the problem using it.
Sorry for joining the party so late.
@boind12 could you run ANALYZE in the ops schema once and check again? What kind of requests are you running and are you able to do the same directly in SQL and see how it behaves (you can print the sql query and fill the placeholders manually)? How much memory are you giving Docker and have you played around with pgtune settings? In a nutshell: it's most likely a postgres issue.
Hi @TimMcCauley, thanks for your support! I am using the following setup:
Sporadically the gunicorn workers time out - this may be due to the worker class settings: http://docs.gunicorn.org/en/stable/settings.html