apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.1k stars 834 forks source link

[core] Use parallelismBatchIterable to reduce memory cost #3598

Closed JingsongLi closed 1 week ago

JingsongLi commented 1 week ago

Purpose

This closes #3590

The ScanParallelExecutor.parallelismBatchIterable is designed to do parallelly execution with memory control, and the parallelism is controlled by scan.manifest.parallelism.

Tests

API and Format

Documentation