apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.16k stars 855 forks source link

[spark] make metadata columns passed across shuffles #3504

Closed YannByron closed 1 month ago

YannByron commented 1 month ago

Purpose

when there is a big subquery that need to involve shuffle to compute in delete condition, the metadata columns should be the output of DatasourceV2Relation, not just in the output of DataSourceV2ScanRelation, that can be computed correctly.

Tests

API and Format

Documentation

JingsongLi commented 1 month ago

+1