Open SinghAsDev opened 3 months ago
@SinghAsDev , thanks for proposing this, I have some questions:
@FANNG1 please find answers below.
what's the benefit of implementing this on the Iceberg REST server? speed up by cache? this may cost a huge memory.
Having this on rest server will allow for mechanism to fetch partitions quickly and efficiently. This can be achieved through optimizations like caching, skipping reading of manifests with single partition, etc. Another benefit is that different clients will be able to use this without having it.
could you share the scenes about how you use it?
Sure, it will enable existing partition waiters, partition discovery and data freshness toolings to work for Iceberg tables and hive tables.
Iceberg introduced partition statistics file in 1.5.0, we should also consider this.
Sure, that's another benefit of this approach, we can change/ add optimizations with time.
Got it, thanks for your reply.
Thanks for the issue and patch @SinghAsDev , shall we move forward the patch?
Describe the feature
Add capability to fetch partitions from Iceberg tables to enable easy to use and efficient mechanism to fetch Iceberg table partitions.
Motivation
It is common for users to build use-cases, like waiters, that depend on partitions information of a table. While users are moving from hive to iceberg table format, one of the blockers they see is the ease and speed of partitions access information. Very large iceberg tables (with multiple 10Ks) partitions takes over an hour and over 50g of memory.
Describe the solution
Add rest endpoint to get Iceberg partitions.
Additional context
No response