apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.41k stars 945 forks source link

[Feature] Create Dela log after each commit #2511

Open jles01 opened 10 months ago

jles01 commented 10 months ago

Search before asking

Motivation

Motivation is to make paimon readable by databricks sql endpoints

Solution

After each commit we can create delta log which will have a list of underlying parquet files

Anything else?

Is there a way to determine parquet files when creating manifest in paimon ?

Are you willing to submit a PR?

jles01 commented 9 months ago

Can anyone help me. I want to create delta lag after each paimon commit (also with delta delete vectors etc) question how can I retrieve the list of files created by paimon.

JingsongLi commented 5 months ago

This is a big feature, it is not easy to do, but we will consider this.

jles01 commented 5 months ago

Hey I have a working prototype and will create WIP PR. Already tested against databricks sql warehouse and seems to work. Just need to add more coverage tests