facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
28.61k stars 6.32k forks source link

Remove non empty levels after DB shrinks #9437

Open siying opened 2 years ago

siying commented 2 years ago

If most KVs in a DB are deleted, ideally, the LSM-tree should be adjusted to an ideal state. We know that it will naturally happen in Universal Compaction, and is unlikely to happen in non-dynamic leveled compaction without a full compaction. With some special treatment in dynamic leveled compaction, we should make it possible. If we identify a base level becomes obviously excessive, we should pause L0 -> base level compaction temporarily and compact this level to the next level.

The question was raised by @mdcallag .

mdcallag commented 2 years ago

Inspired by this error message where each level's target size is computed. And this long-open, hard-to-implement feature request for InnoDB that requests it have an option to return space to the filesystem.

ajkr commented 2 years ago

I tried this once in #3921. It's not very aggressive in that it only drains levels when there's no other compaction needed. The idea of suspending compaction to base level and compact the whole base level away in one shot is interesting. It seems to assume the base level drain is the most important thing to complete, which might not be true. But it is helpful in guaranteeing that the base level drain actually finishes, whereas in my PR it seems possible that base level drain and L0->Lbase could interleave forever (still I doubt that increases overall write-amp by much, and even less so after considering it's not always going to be the case).