apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.1k stars 834 forks source link

[spark] support to read multi splits in a spark input partition #3612

Closed YannByron closed 4 days ago

YannByron commented 6 days ago

Purpose

Linked issue: close #xxx

Tests

API and Format

Documentation

YannByron commented 4 days ago

Without this pr, there are 115k+ tasks. image

With this pr, it reduce to 10k+ tasks. image