[Workaround] Add max requests to gunicorn

nuwang commented 10 months ago

From @mvdbeek on: https://matrix.to/#/!rfLDbcWEWZapZrujix:gitter.im/$f-E-Kl-Cumt4tPAajzrka83yHLSrmVM7IbDkTQjwDQA?via=gitter.im&via=matrix.org

we're using https://github.com/galaxyproject/usegalaxy-playbook/blob/5be97742d688a5b65762bd13064a9484f77f6671/env/main/group_vars/galaxyjobservers.yml#L44C11-L44C11 for that reason. I spent a good amount of time looking at this and could confirm a memory leak in the asyncio stack when there are (temporarily) a lot of concurrent connections. The memory allocated for these connections is never released and there's not much we can do, except for switching to hypercorn and trio instead of gunicorn + uvicorn + asyncio that said 23.1 reduced steady-state memory consumption by about 2 fold. it doesn't change anything about the growing memory consumption with concurrent requests, but I thought I'd mention this the big drop in memory consumption is when we added the relevant patches to 23.1 and added --max-requests the individual web handler processes now rarely ever go beyond 1.2GB RSS

mvdbeek commented 10 months ago

You may want to check how far apart you need to place the restarts and I'd assume this works better with more than 1 web worker, since there is a period where no worker is available while a new worker is being forked.

I'd also only do this if you're actually seeing a leaky pattern, I would assume you'll only see this at the usegalaxy.* scale.

nuwang commented 10 months ago

Thanks for the review.

You may want to check how far apart you need to place the restarts and I'd assume this works better with more than 1 web worker, since there is a period where no worker is available while a new worker is being forked.

It's just one worker by default to support smaller installations. We should probably introduce HorizontalPodAutoscaling at some future date.

I'd also only do this if you're actually seeing a leaky pattern, I would assume you'll only see this at the usegalaxy.* scale.

My assumption here was that these params would do no harm, and since I spotted your fix, that we might as well future proof things now. Sound ok?

mvdbeek commented 10 months ago

If we remember to take this hack out in the future that's fine, sure.

nuwang commented 10 months ago

In that case, I'll just leave this in draft status for future reference. If someone needs it, it can always be merged, and in the meantime, we can hope that things just get fixed upstream.

galaxyproject / galaxy-helm

[Workaround] Add max requests to gunicorn #443