apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.41k stars 945 forks source link

[Feature] Flink Catalog supports partition methods #747

Closed JingsongLi closed 1 year ago

JingsongLi commented 1 year ago

Search before asking

Motivation

Flink Catalog can support: listPartitions listPartitionsByFilter getPartition partitionExists dropPartition

Solution

No response

Anything else?

No response

Are you willing to submit a PR?

felixYyu commented 1 year ago

I also want to finish it, please assign to me.

ericxiao251 commented 1 year ago

Hi, I would like to work on this as well.

ericxiao251 commented 1 year ago

I did a bit of digging and I think we might want to leverage the CatalogTable class from package org.apache.flink.table.catalog to manage a table's partitions. But this class only tells you if the table is partitioned (isPartitioned) and what the partition keys are (getPartitionKeys). The reason why I think we could use the CatalogTable class is based on the other function in the FlinkCatalog that also depends on the CatalogTable class (renameTable).

I think I am missing something - but it seems like we might need to add some partitions functionality to CatalogTable in the Flink project before we can implement any of the partition functions above in Paimon?

@JingsongLi do you have any recommendations on how one would go about implementing these partition functions?

JingsongLi commented 1 year ago

Hi @ericxiao251 , we just need to implement functions in FlinkCatalog, for example, for listPartitions, it is similiar to PartitionExpire.readPartitions, here we need to convert BinaryRows to CatalogPartitionSpecs.

liugddx commented 1 year ago

https://github.com/apache/iceberg/pull/1815/files

ericxiao251 commented 1 year ago

👋🏼 thanks @liugddx and @JingsongLi for the direction! I wasn't able to leverage the BinaryRow class, but was able to use the InternalRow class.

Would either of you have a chance to look at the initial implementation and let me know if I am on the right track? It looks like a lot more code than some of the examples both of you've listed, so i am not sure if I have overcomplicated anything.

leaves12138 commented 1 year ago

We can realize listPartitions method and dropPartition method in Paimon Catalog

zhuangchong commented 1 year ago

This pr has been done, I will close this issue.