caddyserver / cache-handler

Distributed HTTP caching module for Caddy
Apache License 2.0
262 stars 19 forks source link

memory leak when adding partial config #28

Closed kresike closed 2 years ago

kresike commented 2 years ago

I'm trying to set up caddy as a caching reverse proxy for several sites and I will be using the config API to add, modify and remove sites as needed. Everything works as expected, except when I configure new sites by adding new routes to the first http server, after say 20 sites the resident set size is at aroung 3G which seems like a lot. If I download the whole config using the API and reload it into a freshly started caddy instance the memory usage remains normal at around 350MB, resident set size is just a bit higher then the used memory.

I've been trying to get more information using pprof, I have some data that might help narrow down the issue, but I think this part is a bit out of my league at this point.

I compiled caddy using the following command: xcaddy build --with github.com/caddyserver/cache-handler --with github.com/corazawaf/coraza-caddy --with github.com/porech/caddy-maxmind-geolocation --with github.com/caddyserver/transform-encoder --with github.com/imgk/caddy-pprof

Caddy version is: v2.5.1 h1:bAWwslD1jNeCzDa+jDCNwb8M3UJ2tPa8UZFFzPVmGKs=

The initial configuration is in the attached zip file, named empty_nosites.config

By configuring one site I mean (the onesite.config file is also in the attached zip): curl -X POST -H "Content-Type: application/json" -d @onesite.config http://192.168.6.31:2019/config/apps/http/servers/srv0/routes/

The final configuration is in the cache_20_sites.config. I got to that by doing the above operation 20 times, with different hostnames.

A also included an svg output from go pprof showing the alloc_space after adding the 20 sites one by one, named 20_sites_onebyone.svg

After saving the configuration using: curl http://192.168.6.31:2019/config/ > cache_20_sites.config

I restarted caddy and posted the whole config right back: curl -X POST -H "Content-Type: application/json" -d @cache_20_sites.config http://192.168.6.31:2019/config/

Then I took another snapshot with pprof. This can be found in the file 20_sites_allatonce.svg

bugreport.zip

Could someone please help me with this?

darkweak commented 2 years ago

Hello @kresike can you put your caddy configuration please ? And if possible a reproductible minimal repository ?

Thank you!

kresike commented 2 years ago

@darkweak I compiled caddy using xcaddy, the command is in the original report, I have no separate repository for this. Also in the original report there is an attached bugreport.zip containing all the config files. Let me know if you need anything else. Thank you!

darkweak commented 2 years ago

It could be the coalescing layer that cause this memory consumption/resident. I'll investigate on this way and try to reduce that. Maybe related to the ristretto memory leaks bugs.

kresike commented 2 years ago

@darkweak let me know if I can help in any way. Could you elaborate on "coalescing layer"? Maybe I could look at it myself, come up with somthing that might help.

darkweak commented 2 years ago

The coalescing layer determine and store the requests that cannot be coalesced. If same multiple requests are sent to the server, only once will go to the backend and the same response will be used for all pending requests. But when you reload the configuration with caddy, this layer storage is not cleared so I'm working on it to call a reset method on it and use the caddy cleanup to trigger it on configuration reload.

kresike commented 2 years ago

I see. Let me know if you have a working version, I'll be happy to test it.

darkweak commented 2 years ago

This commit https://github.com/darkweak/souin/pull/220/commits/553c1c75ca6a21b31c488472f78067903df122e3 should fix that, I tried to load more than 20 times the config and with that the memory seems to be stable.

darkweak commented 2 years ago

You can now try with --with github.com/darkweak/souin/plugins/caddy to get the memory leak fix.

kresike commented 2 years ago

Didn't have too much time to test this, but at first glance it looks great. Will do some more intensive testing and report back.

kresike commented 2 years ago

I've done some more tests. Seems that the memory leak issue is fixed. After more than 500 sites memory usage remained within a few megabytes of the original memory usage.

Now there is a linear slowdown in provisioning sites. At 10-20 sites the config is reloaded in 2-3 seconds on my desktop machine but at ~500 sites it takes more than 40 seconds to reload the configuration.

profile003 I've attached a profile I made. Seems like ristretto is clearing a lot of stuff, then the garbage collector runs a lot.

darkweak commented 2 years ago

For each site, caddy will run the following workflow:

kresike commented 2 years ago

I thought about this some more, and this seems to work like this by design because whatever change is done in the caddy config, caddy starts a new instance internally and shuts down the old one after the new one is up. The original problem has been solved, so I'm closing the issue.

Thanks for fixing this @darkweak !