apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.48k stars 970 forks source link

[Bug] Compaction add parallelize parallelism to avoid small partitions #4157

Closed askwang closed 2 months ago

askwang commented 2 months ago

Search before asking

Paimon version

0.8

Compute Engine

spark

Minimal reproduce step

call sys.compact

What doesn't meet your expectations?

Actually bucket is 1010, bucket only generate 2 tasks.

before: image

after: image

Anything else?

No response

Are you willing to submit a PR?