Open jbowens opened 2 years ago
This may be used to improve the estimate of ApproxIngestedIntoL0Bytes
for flushable ingests. The precomputed data overlap can be used to determine which sstables overlapped L0 before entering the commit pipeline.
I suspect in these graphs we see ingestions holding up the commit pipeline, causing a divergence between the WAL fsync latency and the raft log commit latency.
This is my current thinking:
levelIter
. We can just find the overlapping files and check for overlap with those.FileMetadata
. The overlap cache consists of a small number of non-overlapping UserKeyBounds
; each one is either a known empty region of the file, or a known "full" region of the file (either just a point, or a range corresponding to a range del or range key).FileMetadata
, we first take a look at the overlap cache and see if it can give us an answer (i.e. if we are completely inside of a "known empty" region, or if we intersect with any "known full" region). If not, we check for overlap and add one or two regions in the cache (one for any empty region we find, and one for any full region we find).When restoring a table, we can expect many external ingestions in the same area so the cache should help across ingestions too. In any case, to minimize I/O while holding mutexes, we will do an optimistic "dry run" without the mutex; then, if the version has changed, we run it again under the mutex, and it will benefit from all the cached regions from the dry run.
We can just find the overlapping files and check for overlap with those.
If a file's bounds are wholly contained within the ingested sstable bounds, I think we should also assume the file is non-empty and has data overlap without checking. During imports, ingested sstables can be wide in keyspace and we wouldn't want to actually look for data overlap in every overlapping file.
To identify the target level of M ingested files in a LSM with N levels, we may need to seek in O(N × M) sstables to look for data overlap. This is all performed during application while holding
DB.mu
and the manifest lock, which prevents concurrent flushes and compactions from scheduling or completing while the mutex is held.Ingest performs these data overlap checks while holding the manifest lock for a consistent view of the LSM. However, for the purposes of detecting data overlap, ingest could be optimistic about its view of the LSM.
When Ingest acquires the manifest lock during the apply step, Ingest can exclude any sstables or memtables with largest sequence numbers already observed during data overlap calculations. Ingest only needs to search in any new sstables for more recent overlap introduced since its optimistic scan.
Related to #25 and this TODO to reduce overlap checks by using L0 sublevels.
Also related to #1683 which will remove the strict requirement of preventing file boundary overlap, making target level a function of data overlap exclusively.
Jira issue: PEBBLE-173