apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.47k stars 969 forks source link

[spark] Support partition and bucket metadata column #4180

Closed ulysses-you closed 2 months ago

ulysses-you commented 2 months ago

Purpose

This pr adds __paimon_partition and __paimon_bucket metadata columns. They can help find partition and bucket easily , e.g., in PaimonCommand#collectDeletionVectors.

Tests

add test

API and Format

no

Documentation

ulysses-you commented 2 months ago

cc @JingsongLi @YannByron do you have any concern about these two metadata columns ? thank you