canonical / dqlite

Embeddable, replicated and fault-tolerant SQL engine.
https://dqlite.io
Other
3.8k stars 214 forks source link

Consider periodically using VACUUM to reduce memory footprint #530

Open freeekanayaka opened 11 months ago

freeekanayaka commented 11 months ago

I've experimented using VACUUM against a real-world medium-size Incus/LXD database (~700 containers) that had a size of 41 Megabytes. After running the VACUUM command, the size of the database was down to 3 Megabytes.

Depending on the workflow, if there are large databases that create issues with memory/snapshotting, this might be a relatively cheap and effective measure to explore. It might not have such a dramatic effect in all cases though.

freeekanayaka commented 11 months ago

The database I mentioned had been running for several years, most probably performing a number of INSERTs and DELETEs over the course of time, which presumably slowly increased database fragmentation more and more. That means that running VACUUM regularly is advisable, especially for workloads with a lot of data updates.

I also tried to run .dump on that database and create it from scratch with the same data, and I got the same result: database size went from 41 MB to 3 MB, as it would be expected.

MathieuBordere commented 11 months ago

Haven't looked into it yet, but weren't there issues with running VACUUM on a dqlite database, did you have to adapt some things?

MathieuBordere commented 11 months ago
The VACUUM command works by copying the contents of the database into a temporary database file and then overwriting the original with the contents of the temporary file. When overwriting the original, a rollback journal or [write-ahead log](https://www.sqlite.org/wal.html) WAL file is used just as it would be for any other database transaction.

I'm just wondering how it would work in dqlite's case

cole-miller commented 11 months ago

Yeah, I looked into using VACUUM to reset/empty a database in #435 and concluded that a prerequisite was proper support for attaching additional databases to a dqlite-managed connection, see https://github.com/canonical/dqlite/pull/435#issuecomment-1331257989. But I could certainly be missing something.

freeekanayaka commented 11 months ago

I can think of several design options. But in my mind it's most probably something that's needed for long-running deployments.

MathieuBordere commented 11 months ago

I can think of several design options. But in my mind it's most probably something that's needed for long-running deployments.

yes agree that it should be scheduled regularly when e.g. # free pages / # total pages passes a certain threshold (just thinking out loud)

freeekanayaka commented 11 months ago

If you are looking for something cheap, this could initially be implemented "offline", e.g. when the process restarts.

cole-miller commented 8 months ago

VACUUM needs to be coordinated using a Raft log entry, since it affects page numbers in the future.

freeekanayaka commented 8 months ago

VACUUM needs to be coordinated using a Raft log entry, since it affects page numbers in the future.

Sounds like a good point, thanks.