apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.27k stars 911 forks source link

[Feature] support paimon jdbc catalog #841

Open melin opened 1 year ago

melin commented 1 year ago

See iceberg. Custom catalogs are supported. For example, users can customize catalogs based on jdbc. Table metadata can be stored in a relational database, independent of hive meta.

https://iceberg.apache.org/docs/latest/jdbc/

image
JingsongLi commented 1 year ago

See iceberg. Custom catalogs are supported. For example, users can customize catalogs based on jdbc. Table metadata can be stored in a relational database, independent of hive meta.

https://iceberg.apache.org/docs/latest/jdbc/ image

Hi @melin thanks for reporting, is this requirement comes from your company production env?

melin commented 1 year ago

Yes, metadata storage does not depend on hms. can customize the metadata storage mode, which is suitable for more scenarios, such as cloud deployment.

s7monk commented 1 year ago

Are you going to implement this feature? If no one has done it yet, I am very interested in this, I can try to complete it

s7monk commented 1 year ago

I think we also need jdbc catalog or user-defined catalog. We currently rely on hms, but we have the idea of breaking away from hive

melin commented 1 year ago

It is currently implemented based on iceberg. iceberg catalog provides a fileio interface for more customized acceleration. hms is the metadata storage system of hadoop data warehouse, which is no longer satisfied with the data lake metadata storage system. iceberg rest catalog is a more scalable approach, leaving the implementation to the user

s7monk commented 1 year ago

It is currently implemented based on iceberg. iceberg catalog provides a fileio interface for more customized acceleration. hms is the metadata storage system of hadoop data warehouse, which is no longer satisfied with the data lake metadata storage system. iceberg rest catalog is a more scalable approach, leaving the implementation to the user

Yes, if it is on the cloud, I think hive may not be very necessary, so we need to give users more choices

s7monk commented 1 year ago

Hi, what's the progress?

zhangjun0x01 commented 1 year ago

+1 for custom catalog

  1. Object storage cannot guarantee the atomic rename。
  2. hive metastore rely on hms and mysql , it may become a bottleneck
  3. business requirements: one tenant may have many sub account , hope to achieve resource isolation by catalog