apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.46k stars 966 forks source link

[core][spark] Support push down limit for primary key table #4299

Closed ulysses-you closed 1 month ago

ulysses-you commented 1 month ago

Purpose

For the primary key table, we still have a chance to push down limit if the split is raw convertible. This pr relaxes the push down limit condition to support primary key table.

Tests

add test

API and Format

no

Documentation

wwj6591812 commented 1 month ago

I think this is a good optimize. Change title from "Spark" to "Core",because Flink also use this optimize.

ulysses-you commented 1 month ago

thank you @wwj6591812 , addressed comments

ulysses-you commented 1 month ago

cc @JingsongLi thank you

JingsongLi commented 1 month ago

+1

YannByron commented 1 month ago

Link to https://github.com/apache/paimon/issues/2404.