apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.45k stars 2.42k forks source link

[HUDI-8521] Use commit time merge in preCombine for OverwriteWithLatestAvroPayload #12297

Open linliu-code opened 2 days ago

linliu-code commented 2 days ago

Change Logs

Currently HoodieMergedLogRecordScanner could use HoodiePreCombineAvroRecordMerger to merge log records, which treats all the records from the same batch, and does not respect the order of records across different batches. Such that commit time merging logic is broken.

Therefore, we use the configured merger instead.

Impact

Fix the bug for commit time merging.

Risk level (write none, low medium or high below)

Medium.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

Contributor's checklist

hudi-bot commented 1 day ago

CI report:

Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build