apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.44k stars 959 forks source link

[flink] add coordinate and worker operator for small changelog files compaction #4380

Closed LsomeYeah closed 3 weeks ago

LsomeYeah commented 1 month ago

Purpose

Linked issue: close #xxx

Add a Coordinator node to small changelog files compaction pipeline to decide how to concatenate it into a target file size result file, which can be one or multiple files, and add a worker node to merge those small files.

Tests

API and Format

Documentation