Terkwood / AugustDB

Key/value store backed by LSM Tree architecture.
MIT License
8 stars 1 forks source link

Commit Log clean up #107

Closed Terkwood closed 3 years ago

Terkwood commented 3 years ago

The initial version of Commit Log work lacks a clear strategy for cleaning up old entries. (#104) This ticket describes a better approach.

Designing a foolproof strategy for swapping commit log files

Instead of using a single commit.log file, always create a new commit log named "commit-#{:erlang.system_time()}.log" when the app starts up.

Rules

  1. Track the name of the current commit log file in the state of CommitLog server/agent.
  2. Whenever you flush the memtable, start a new commit log named "commit-#{:erlang.system_time()}.log".
  3. When you flush the memtable, you can then delete the previous commit log. Don't go back and delete all of them -- we'll handle the replay of old commit logs at app startup only.
  4. When you start the app, look back to the oldest commit log file. Replay it. Then flush the memtable. Given rule 3, you'll then clean up that commit log. Continue this process until you've replayed all of the outdated commit logs.

Caveats

There's some annoying state management that has to be taken care of to support the replay functionality: individual replays must be announced to the CommitLog genserver with a {:begin_replay, some_commitlog_path} message. Then when the replay is complete, it must be announced via :end_replay. This will prevent accidental deletions of commit logs which are being read, in case the log is huge and memtable flushes while it's being processed.