bnclabs / gostore

Storage algorithms.
MIT License
35 stars 4 forks source link

bogn: Tombstone purge. #62

Closed prataprc closed 6 years ago

prataprc commented 6 years ago

In LSM mode, entries are not deleted. Instead they are marked as deleted. If index is going to suffer plenty of deletes and sets, this will quickly increase the index data size in spite of the fact that only a small subset of entries are non-deleted entries.

To avoid blow up of storage size, we will have to expose an API to purge deleted entries that are older than certain period. Also note that bogn does not assume that either LLRB or BUBT (or any other future storage algorithm) maintains a timestamp of each entry.

One approach would be: Let compactor monitor the seqno number on the write path and register the current-seqno every hour. This hourly seqno can then be included in disk-snapshot's metadata, which already includes the timestamp of the disk snapshot. When tomb-stone purger is called, it will mark the oldest snapshot for tombstone purging. When compaction is invoked for the oldest snapshot, it will convert the timestamp to purge-seqno and purge all entries marked as deleted before the purge-seqno. purge-seqno.

prataprc commented 6 years ago

Bogn supports LSM (log-structured-merge) where deleted entries are marked as deleted and left in the index. It may be necessary to clear up these entries (marked dead) to free up disk space.

func (bogn *Bogn) TombstorePurge() int64

Should return number of entries, marked as dead, purged from the index. This API should only pick the 15th level (the oldest level) for purging deleted entries.

Closing duplicate #74