apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.47k stars 969 forks source link

[core] Optimize IncrementalStartingScanner to plan with thread pool #4206

Closed JingsongLi closed 2 months ago

JingsongLi commented 2 months ago

Purpose

When there are 100+ snapshots, IncrementalStartingScanner will be very slow, we can optimize it to execute in thread pool.

In this PR, introduced ManifestsReader for SnapshotReader to read manifest files, and expose SnapshotReader.readManifest to public. Outside can use these two interfaces to muti-threads execution.

Tests

API and Format

Documentation