Open ajkr opened 5 years ago
I didn't know we have (3). When did we introduce it?
Looks like 235b162be13910a5f5b72cf0b30bd3255de14d67.
The more concerning part to me is using my L0 compaction scoring based on file size together with your base-level compaction scoring based on comparison to L0 size. I think that makes the base level's score unfairly low.
Oh I did it. Oops.
The LSM-tree shape looks much better with (3) than without. Is it a side effect?
Yeah, you're right, the experiments in the description do not prove (3) should be removed. I should've also measured without (3) together with L0 scoring change. Both the strategies shown in the description end up top-heavy and not really satisfying the nice smooth shape we want.
@ajkr I agree. We should rethink it more.
RocksDB gets stuck in L0->L0 and L0->base compactions in very write-heavy benchmarks. base->base+1 almost never happens and base level's score is usually reported as zero due to pending L0->base.
It is caused by interactions between:
(1) intra-L0 compaction, (2) Siying's optimization to use L0 size as base level target size in write-heavy scenario, and (3) An existing workaround to disable base->base+1 compaction when L0 is eligible for compaction but not scheduled.
(3) has existed the longest but does not seem particularly relevant in these modern times, where we can keep doing intra-L0 while base level is contended. I saw more benefit than expected by removing it, though cannot explain why yet.
Benchmark command:
Results with (3):
Results without (3):
I'd also speculate that when (2) is active, we should be calculate L0 compaction score using file count only, i.e., do not take into account L0 size.