apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.11k stars 834 forks source link

[Feature] Strategically Engineering Wide Tables with Foreign Key-Based Joins #2270

Open MonsterChenzhuo opened 7 months ago

MonsterChenzhuo commented 7 months ago

Search before asking

Motivation

When performing a join operation with Flink's dual stream join, there is an issue with excessive state storage. Flink Lookup Join only concerns itself with changes in the primary stream; changes in the dimension table cannot update data that has already been joined. With Partial Update association, there is a problem where the joined tables need to have a common primary key to ensure matching between two or more data sources. All the aforementioned solutions for generating wide tables through multi-table associations have their limitations.

I would like to implement an entirely new method to overcome these shortcomings: a dynamic dimension table-driven Lookup Join, which I also refer to as 'ForeignKey Widening'. https://cwiki.apache.org/confluence/display/PAIMON/PIP-12%3A+Strategically+Engineering+Wide+Tables+with+Foreign+Key-Based+Joins

Solution

No response

Anything else?

No response

Are you willing to submit a PR?

polyzos commented 3 months ago

@MonsterChenzhuo whats the status of this? It's a pretty amazing and useful feature

qinjunjerry commented 3 months ago

Indeed amazing! Exactly what I am looking for :)

eric666666 commented 1 month ago

@MonsterChenzhuo whats the status of this? It's a pretty amazing and useful feature

+1,Are there any plans for release version?