apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 955 forks source link

[Flink]Optimize flink listPartitions speed #4495

Closed herefree closed 1 week ago

herefree commented 1 week ago

Purpose

Flink listPartitions from catalog.listPartitions, caching catalog can increase acquisition speed of partition.

Linked issue: close #xxx

Tests

API and Format

Documentation

JingsongLi commented 1 week ago

Hi @herefree , can you explain what is the difference?

herefree commented 1 week ago

Hi @herefree , can you explain what is the difference?

Before, in flink, every time we get partition, we need to get it from readBuilder.newScan().listPartitionEntries(). After modification, we get partitions from Catalog.listPartitions. If Catalog is CachingCatalog, we can get partitions in cache.

JingsongLi commented 1 week ago

Hi @herefree , can you explain what is the difference?

Before, in flink, every time we get partition, we need to get it from readBuilder.newScan().listPartitionEntries(). After modification, we get partitions from Catalog.listPartitions. If Catalog is CachingCatalog, we can get partitions in cache.

Really? How to get partitions from cache? It seems that you do not modify the CachingCatalog.

herefree commented 1 week ago

Hi @herefree , can you explain what is the difference?

Before, in flink, every time we get partition, we need to get it from readBuilder.newScan().listPartitionEntries(). After modification, we get partitions from Catalog.listPartitions. If Catalog is CachingCatalog, we can get partitions in cache.

Really? How to get partitions from cache? It seems that you do not modify the CachingCatalog.

image I found that we have implemented partitionCache in CacheCatalog. I found a problem. There is no place to refresh the partition in Flink. Sorry,maybe I should turn off this feature.