[X] I searched in the issues and found nothing similar.
Motivation
Why we need this.
Currently , compact action is fullCompaction in batch mode, that will merge all base file with delta file and generates a new base file. After that, we will have two copies of the full data in storage (base_file1 + delta_file1 + base_file2).
But : Sometimes we just need to merge incremental data, we allow some reduction in read performance in exchange for storage space.
Solution
This will be implemented through 3 PRs :
step 1 : Refactor compact action to support extended compact type.
step 2:Compact action supports using full_compaction to decide which compaction will be triggered FullCompaction or UniversalCompaction.
step 3:Add a new Procedure universal_compact for spark and flink
Search before asking
Motivation
Why we need this.
Currently , compact action is fullCompaction in batch mode, that will merge all base file with delta file and generates a new base file. After that, we will have two copies of the full data in storage (base_file1 + delta_file1 + base_file2).
But : Sometimes we just need to merge incremental data, we allow some reduction in read performance in exchange for storage space.
Solution
This will be implemented through 3 PRs :
step 1 : Refactor compact action to support extended compact type.
step 2:Compact action supports using
full_compaction
to decide which compaction will be triggered FullCompaction or UniversalCompaction.step 3:Add a new Procedure universal_compact for spark and flink
Anything else?
No response
Are you willing to submit a PR?