apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 955 forks source link

[core][flink] Introduce HashMapLocalMerger to 'local-merge-buffer-size' #4492

Closed JingsongLi closed 1 week ago

JingsongLi commented 1 week ago

Purpose

We previously introduced the Local sorting method for merging, which can reduce a large amount of data when there is a lot of repetition.

However, the cost of sorting is very high. We can refer to Flink's Local Aggregation and introduce Hash, which can greatly improve the performance of Local Merge. (3 times+)

Tests

API and Format

Documentation