ajnisbet / opentopodata

Open alternative to the Google Elevation API!
https://www.opentopodata.org
MIT License
327 stars 71 forks source link

Severe memory leak #68

Closed stx closed 1 year ago

stx commented 1 year ago

Thanks for your work on this awesome project.

We upgraded from 1.7.1 to 1.8.2 and and we're now seeing an infinite memory leak until the server dies. It is very painful.

We're using the stock repo config with ASTER 30m and with listen = 1024 added to uwsgi.ini.

image

By the time I finished writing this, the memory used has already gone up to 3.5GB.

Can workaround/fix it by adding reload-on-as = 512 to uwsgi.ini.

ajnisbet commented 1 year ago

Thanks for raising this issue!

In 1.8.2 I updated uwsgi by a couple of versions (plus a bunch of other dependencies), any of those could have triggered the issue. I've also only been running up to 1.8.0 in production with no issues.

So downgrading to 1.8.0 might also be a solution if you need features from later releases.

I'll look into replicating this and debugging uwsgi / the root cause shortly. I also think reload-on-as would generally be a good thing to add in the config anyway, thanks for the heads up about that, uwsgi is daunting to configure!

Quving commented 1 year ago

Hello, @ajnisbet I can confirm there is a memory-leak using the latest github version (2-days old). I was wondering, my my job keep being cancelled over a period of 3-4 hours. On grafana (screenshot below) I can see this. I have computed 3 million datapoints for a research-project. Mainly all parameters are default (as in the code). We use the dataset eudem.

image

ajnisbet commented 1 year ago

I was able to replicate this, it should be fixed in 1.8.3 which I released just now!

I'm still not sure what the root cause is, but downgrading rasterio to below 1.3.0 fixes the problem. Unfortunately there aren't wheels for 1.2.10 and python3.10, so I had to downgrade python too.

For good measure I added some uwsgi worker reloading parameters as suggested by @stx. I have these enabled on gpxz.io which may have hidden the problem there.

max-requests = 10000
max-worker-lifetime = 3600
worker-reload-mercy = 20
reload-on-rss = 512
reload-on-as = 512

None of the issues on rasterio seem to match so the root cause is still a mystery. Hopefully it'll disappear one day in a future version of rasterio.