Optimize gunicorn settings running with docker

GIScience / openpoiservice

:round_pushpin: Openpoiservice is a flask application which hosts a highly customizable points of interest database derived from OpenStreetMap data.

https://openrouteservice.org

Apache License 2.0

173 stars 22 forks source link

Optimize gunicorn settings running with docker #30

Open TimMcCauley opened 6 years ago

TimMcCauley commented 6 years ago

Sporadically the gunicorn workers time out - this may be due to the worker class settings: http://docs.gunicorn.org/en/stable/settings.html

TimMcCauley commented 5 years ago

https://pythonspeed.com/articles/gunicorn-in-docker/

TimMcCauley commented 5 years ago

@zephylac have you any experience with gunicorn settings? Sometimes requests are timing out on our live servers using the following settings:

workers = 2
worker_class = 'gevent'
worker_connections = 1000
timeout = 30
keepalive = 2

I now am trying the following settings instead which are recommended in the post above.

worker_class = 'gthread'
threads = 4

zephylac commented 5 years ago

I don't have any experience with gunicorn but I can try to have a look into it and find some info.

I'm currently spamming my instance with request but I didn't experienced any timeout (for now).

zephylac commented 5 years ago

I've looked into it a little bit. In the article you mentionned they were also talking about --worker-tmp-dir which might cause problems to workers.

I've already seen some info about threads option. Opinions seemed to converged to threads = workers. It seems that the ‘(solution)[https://www.brianstorti.com/the-role-of-a-reverse-proxy-to-protect-your-application-against-slow-clients/]’ some found was to expose NGINX in front of gunicorn.

On my side I've tried to timeout my workers (without changing current gunicorn parameters). On both extreme load or rest, my workers don't seem to timeout.

TimMcCauley commented 5 years ago

Thanks for looking this up @zephylac - if you are running your batch requests, could you also run them against api.openrouteservice.org at the same time? I can send you a token allowing a higher quota - if you agree - which email could I send the token to?

zephylac commented 5 years ago

I've sent you an email !

zephylac commented 5 years ago

Under which architecture are you running your service ? Are you using docker ? Are you running on VM or dedicated ?

TimMcCauley commented 5 years ago

We are running this on a VM in our openstack environment with 32GB RAM and 8 cores. The postgis database is running on a different and smaller VM with unfortunately with very slow disks (which soon will be updated to SSDs). The containers running on this VM are

ubuntu@ors-microservices:~|⇒  sudo docker ps
CONTAINER ID        IMAGE                                      COMMAND                  CREATED             STATUS              PORTS                      NAMES
68404976f9d6        openelevationservice_gunicorn_flask_2      "/oes_venv/bin/gun..."   8 weeks ago         Up 2 days           0.0.0.0:5021->5000/tcp     openelevationservice_gunicorn_flask_2_1
6959766a7ee9        openelevationservice_gunicorn_flask        "/oes_venv/bin/gun..."   8 weeks ago         Up 2 days           0.0.0.0:5020->5000/tcp     openelevationservice_gunicorn_flask_1
ec736d4cd30c        openpoiservice_gunicorn_flask_05122018_2   "/ops_venv/bin/gun..."   5 months ago        Up 24 hours         0.0.0.0:5006->5000/tcp     openpoiservice_gunicorn_flask_05122018_2_1
c62417a4f60e        openpoiservice_gunicorn_flask_05122018     "/ops_venv/bin/gun..."   5 months ago        Up 24 hours         0.0.0.0:5005->5000/tcp     openpoiservice_gunicorn_flask_05122018_1

zephylac commented 5 years ago

Does the workers are timing out even on idle ? Or just under load ?

I've looked on my logs, none of my workers have timed out during 1 week of intense load.

TimMcCauley commented 5 years ago

Some requests will simply timeout but I haven't found a pattern for this yet.

zephylac commented 5 years ago

Maybe PostgreSQL12 & PostGIS 3 will fix a part of this issue by supporting correctly the parallelization.

TimMcCauley commented 5 years ago

Agreed. Did you test the live API with the token I sent you by any chance @zephylac ?

zephylac commented 5 years ago

Yup I tried but it seems it has expired.

TimMcCauley commented 5 years ago

Ah shit, sorry - it's now extended forever ;-) and won't expire anymore (same token as in the email).

boind12 commented 4 years ago

Hi @TimMcCauley, obviously it has been a while, but as I am facing the same issue you described (random timeouts with larger batches of POI requests using docker) I am wondering, if you have found a solution?

lingster commented 4 years ago

maybe this topic might help? https://pythonspeed.com/articles/gunicorn-in-docker/

boind12 commented 3 years ago

Hi @lingster, this link was mentioned earlier by Tim. I was unable to solve the problem using it.

TimMcCauley commented 3 years ago

Sorry for joining the party so late.

@boind12 could you run ANALYZE in the ops schema once and check again? What kind of requests are you running and are you able to do the same directly in SQL and see how it behaves (you can print the sql query and fill the placeholders manually)? How much memory are you giving Docker and have you played around with pgtune settings? In a nutshell: it's most likely a postgres issue.

boind12 commented 3 years ago

Hi @TimMcCauley, thanks for your support! I am using the following setup:

Host: 16GB, 2vCPU with 50GB SSD (Google Cloud e2-highmem)
The host is running:
- 1x Openrouteservice: https://github.com/GIScience/openrouteservice
- 1x Openpoiservice
- 1x postgis: https://hub.docker.com/r/kartoza/postgis/ I am running large batch request for POIs with >50km2 area, hence I assume it takes some of them longer then the 30s timeout of the gunicorn from openpoiservice. By increasing the timeout of the gunicorn runner to 60s I was able to solve the issue. However I now migrated the postgis from the VM to a dedicated Google PostGre instance. Maybe this helps further.