Open jbowens opened 1 year ago
This can help with write amp too.
There might be an opportunity for a heuristic that incorporates lower level file boundaries into the compaction iterator's frontiers type, and notices when the next key skips over a large gap in lower levels. When a large enough gap is encountered, the output can be split early.
Could use sstable.Reader.EstimateDiskUsage
to decide when a gap is "large" vs "small", and use virtual sstable slicing to avoid re-writing an L6 sstable when only a portion of it is overlapping with a higher-level. There is some overlap with this idea and https://github.com/cockroachdb/pebble/issues/518 (generalized "move" compactions).
Consider a workload where the keyspaces a-f and u-z are write heavy, while the keyspace g-t is read only and heavy. Incoming writes to a-f and u-z keyspaces can easily be included in the same sstable, spanning the read-only g-t keyspace. These sstables unnecessarily increase the read amplification of reads in the g-t keyspace.
Ideally, we'd avoid constructing sstables spanning the read-only region so that only the regions receiving writes suffer read-amp:
User-defined "guards" #517 are one approach to this problem, but shift the compaction-output splitting problem to the user. Setting too aggressive of guards can result in many little files in the LSM.
There might be an opportunity for a heuristic that incorporates lower level file boundaries into the compaction iterator's
frontiers
type, and notices when the next key skips over a large gap in lower levels. When a large enough gap is encountered, the output can be split early.See cockroachdb/cockroach#93427 for another approach at reducing the effective read-amp in this kind of scenario.
Jira issue: PEBBLE-202
Epic CRDB-40361