Closed evenyag closed 1 week ago
[!NOTE]
Reviews paused
Use the following commands to manage reviews:
@coderabbitai resume
to resume automatic reviews.@coderabbitai review
to trigger a single review.
The recent changes introduce a MergeMode
option across various components of the mito2
module to handle time series data merging strategies during compaction, scanning, and memtable operations. Updated classes, structures, and functions now accommodate a merge_mode
parameter, influencing the handling of duplicate rows. Additionally, new tests ensure the correct implementation of these features.
File(s) | Change Summary |
---|---|
.../benches/memtable_bench.rs |
Introduced MergeMode::LastRow in the creation of TimeSeriesMemtable for benchmarks. |
.../compaction.rs |
Added CompactionSstReaderBuilder with merge_mode for compaction tasks. |
.../compaction/compactor.rs |
Replaced build_sst_reader with CompactionSstReaderBuilder , included update_mode parameter. |
.../compaction/window.rs |
Added merge_mode field in the struct within tests. |
.../engine.rs |
Added update_mode_test module. |
.../engine/merge_mode_test.rs |
Included tests for merge_mode functionality. |
.../memtable.rs , .../memtable/partition_tree.rs , .../memtable/time_series.rs |
Added MergeMode to configurations and memtable constructs. |
.../read/dedup.rs |
Introduced LastNotNullIter for custom deduplication strategies. |
.../read/scan_region.rs , .../read/seq_scan.rs |
Integrated MergeMode in ScanInput and adjusted DedupReader for new strategy. |
.../region/opener.rs , .../region/options.rs |
Added merge_mode field to RegionOptions , updated memtable_builder . |
sequenceDiagram
participant User
participant Engine
participant CompactionSstReaderBuilder
participant ScanInput
User ->> Engine: Start compaction
Engine ->> CompactionSstReaderBuilder: Initialize with MergeMode
Engine ->> CompactionSstReaderBuilder: Build SST reader
CompactionSstReaderBuilder ->> Engine: Return SST reader
Engine ->> ScanInput: Configure with MergeMode
ScanInput ->> Engine: Initiate scanning
Engine ->> User: Return compaction results
On data fields where time's in play, A merge mode comes to save the day. With LastRow set and columns bright, Rows combine in seamless flight. Rabbits coding, lines so neat, Bugs defeated, a code complete! ππβ¨
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
Attention: Patch coverage is 97.17742%
with 14 lines
in your changes missing coverage. Please review.
Project coverage is 84.60%. Comparing base (
948c869
) to head (11936c4
). Report is 3 commits behind head on main.
I am not convinced that UpdateMode
is an appropriate term. It is utilized during scanning to combine duplicate rows, but referred to as UpdateMode
. Personally, I would like MergeMode
or ScanMergeMode
like that.
I am not convinced that
UpdateMode
is an appropriate term. It is utilized during scanning to combine duplicate rows, but referred to asUpdateMode
. Personally, I would likeMergeMode
orScanMergeMode
like that.
UpdateMode
seems to be viewed from the perspective of writing. cc @evenyag
I am not convinced that
UpdateMode
is an appropriate term. It is utilized during scanning to combine duplicate rows, but referred to asUpdateMode
. Personally, I would likeMergeMode
orScanMergeMode
like that.
UpdateMode
seems to be viewed from the perspective of writing. cc @evenyag
Good idea. MergeMode
may be proper.
@evenyag Don't forget to change the PR title and content.
@coderabbitai pause
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
This PR implements a new region option
update_mode
to control how to update existing rows by key. There are two update modes available:LastRow
: overwrites all fields by the values in the latest rowLastNotNull
: only overwrites the null value by the latest value for each fieldMemtables have individual dedup implementations so they can't reuse the
DedupReader
. To avoid repeating the dedup logic for each memtable, this PR implements a new iteratorLastNotNullIter
as a wrapper for the memtable iter.LastNotNullIter
splits batches and invokes theLastNotNull
dedup strategy to dedup them. TheLastNotNullIter
only takes effect whenupdate_mode
isLastNotNull
.After this PR, I will expose the option to table options and add some sqlness tests.
Checklist
Summary by CodeRabbit
New Features
MergeMode
options for handling duplicate row merging in various components.LastNotNullIter
for deduplication based on theLastNotNull
strategy.Enhancements
MergeMode
options.TimeSeriesMemtable
,PartitionTreeConfig
, andScanInput
to includemerge_mode
configuration.Tests
MergeMode
options and append mode functionality.