Open bananabrick opened 1 year ago
I think this idea is similar to the original text in https://github.com/cockroachdb/pebble/issues/2156#issue-1471536094. In that the virtualization was happening at the higher level (there are tradeoffs). That issue was closed since @jbowens found an improved grandparent splitting heuristic that results in better alignment https://github.com/cockroachdb/pebble/issues/2156#issuecomment-1398846061
It is possible that there is further room for improvement via such virtualization -- we should quantify that first.
During some compaction from a level
i
to a leveli+1
, the filef
from leveli
will overlap with some files from leveli+1
.Usually, there are also some files in level
i+1
which will only partially overlap with the filef
.Consider the following case:
The file
f1
overlaps with both the filesf2
andf3
. Moving the filef1
to to leveli+1
, will have a write cost of 9 bytes. For the 8 keys inf1
andf2
, and we're adding a new keyf
.But there's an opportunity here to virtualize
f3
.f1
only overlaps with a single keye
inf3
. We could convertf3
into a virtual sstablef5
: [g, h, i] as part of the compaction version edit. If we do this, we'll only have to write 6 bytes as part of the compaction.We should only perform this virtualization if the data in
f3
, whichf1
overlaps with is significantly less that the total size off1
.Future compactions will further whittle down the file
f5
, without incurring a write cost for keys which they do not overlap with.To prevent space amp from blowing up, we could make sure that as these virtual sstables get small compared to the original physical sstable size, we compact them. I believe this heuristic has already been implemented as part of the existing virtual sstable work. We could also make it less likely that these virtual sstables are created as the disk utilization increases.
Next Steps The first step would be to create a metric to determine what percentage of write amps are due to this partial overlap during compactions.
Jira issue: PEBBLE-62
Epic CRDB-40361