Open melin opened 2 years ago
With the new RestAPI design, we should be able to use RDBMS or key-value store to replace the metadata.json files. The new APIs is WIP, the client will be provided in Iceberg repo, but user needs to implement the server side once the APIs are ready. However, I believe the open source server will be there eventually, it is probably another project. Other than metadata.json file, it needs a major overhaul to put manifest-list or manifest files into RDBMS/Key-value store. It is possible theoretically, but not sure it is the way people want to go.
Bytedance has implemented Hudi MetaStore Server,https://cwiki.apache.org/confluence/display/HUDI/RFC-36%3A+HUDI+Metastore+Server
Thanks for sharing. The Hudi metadata server makes sense generally. However, Iceberg doesn’t have the some of issues in Hudi, for example, file listing issue in Hudi metadata.
I list some benefits of an Iceberg metadata server.
There could be more benefits though.
References
" file listing issue in Hudi metadata." => RFC - 15: HUDI Metadata Table and Cloud/DFS File Listing Improvements
I think it would make sense to consider for example FoundationDB as storage layer for the metadata. That's what Snowflake and Firebolt currently use. Delta seems to also consider this (https://github.com/delta-io/delta/issues/867). Of course it can be a another similar transactional, highly-available, scalable and low latency store (if it exists). Decoupling the metadata from the actual storage would open a lot of possible new use cases. In particular evolving Iceberg to the storage layer for "modern cloud datawarehouses".
Most of Iceberg metadata is stored in the file system and is limited by NameNode performance. Storage engines such as RDBMS, Cassandra and mongodb can be supported through pluggable storage