Memory leak when replication targets are unhealthy

benbjohnson / litestream

Streaming replication for SQLite.

https://litestream.io

Apache License 2.0

11.12k stars 256 forks source link

Memory leak when replication targets are unhealthy #562

Open pbedat opened 10 months ago

pbedat commented 10 months ago

I have the following replication setup:

We are replicating ~40 databases to two cifs mounts (Storageboxes hosted by Hetzner). Those boxes are sometimes undergoing maintenance and one of the mounts go bad. This happened on 27.01. around 11:15 AM and from this point memory steadily increased with each snapshot interval (4h):

Screenshot from 2024-01-28 19-57-58

I'm also seeing the following error logs:

"snapshots: cannot fetch generations: open /opt/copilot/data/replicas-2/milchsackfabrik/generations: permission denied"

It's not a huge problem, but I just wanted to report it. If you need anything, I could setup a replication environment and take readings.

Edit: v0.3.13

hifi commented 10 months ago

It's likely the LZ4 compression library as it never frees a buffer pool it keeps and error conditions seem to hit it hard. Though this pattern is quite suspicious as I'd expect it to be able to reuse the same pool so it might not be it.

If you can take pprof memory dump from a repro it would answer the question where the memory is actually going to.

pbedat commented 10 months ago

@hifi I guess you are right about lz4.

profile001

pprof.litestream.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz

PS: I took the snapshot from replication setup with only 4 DBs. Hence the smaller size.

hifi commented 10 months ago

Yeah, unfortunately there's nothing Litestream can do except change implementation or to support multiple compression schemes with different tradeoffs like CPU over RAM.

There's an open issue against the lz4 library about the ever growing pool but the author hadn't responded to it last time I checked.

pbedat commented 10 months ago

Never mind. It's not a problem, since I'm getting alerts, when replicas go unhealthy and can respond before memory runs out. Thank's for clearifying it!