apache / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
https://gravitino.apache.org
Apache License 2.0
1.09k stars 343 forks source link

[EPIC] Add Paimon catalog support for Gravitino #1129

Open SteNicholas opened 11 months ago

SteNicholas commented 11 months ago

Describe the proposal

Gravitino supports Apache Iceberg catalog at present. Apache Paimon is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics. We could build Paimon catalog to support managing Paimon metadata.

Paimon exposes Catalog pluggable interface and supports several implementation of Catalog like FileSystemCatalog, HiveCatalog. It's recommended to build a Gravitino catalog that refers to the implementations of Paimon. Meanwhile, I would propose the RESTCatalog interface in Paimon community.

Task list

JunpingDu commented 11 months ago

Sounds like a good idea. We don't have Paimon expert now. Would you like to work on it? @SteNicholas :)

SteNicholas commented 11 months ago

@JunpingDu, I would like invite other Paimon contributors to support Paimon catalog together.

YxAc commented 9 months ago

@JunpingDu, I would like invite other Paimon contributors to support Paimon catalog together.

@SteNicholas We are very interested it and waiting for the proposal and milestones to dismantle, look forward to achieve paimon catalog together, thx.

justinmclean commented 9 months ago

@YxAc Can I ask you to take a little more care with your words. I'm sure no ill intent was intended, but It is often hard to read the tone in messages, and the way that was written could be taken the wrong way. Also, people are volunteers here; sometimes, things may take longer than they first intended.

YxAc commented 9 months ago

@YxAc Can I ask you to take a little more care with your words. I'm sure no ill intent was intended, but It is often hard to read the tone in messages, and the way that was written could be taken the wrong way. Also, people are volunteers here; sometimes, things may take longer than they first intended.

@justinmclean Sure, thanks for your remind, I will put it in another way.

Actually, we knew each other and talk about Paimon catalog offline, my words above was just a little joke. This is indeed easy to lead to misunderstanding. I will pay attention to it.

Thank you for reminding me.

justinmclean commented 9 months ago

Another reminder: as we are an open-source project, it is best if all communication is public; that way, all contributors can participate. Please try to have conversations about this feature in public.

YxAc commented 9 months ago

Another reminder: as we are an open-source project, it is best if all communication is public; that way, all contributors can participate. Please try to have conversations about this feature in public.

Sure

coolderli commented 8 months ago

@SteNicholas Hi, I did some investigation on Paimon. I found that Paimon does not need HMS to store a metadata.json like Iceberg. The most important thing is we need an implementation of Lock. For now, I think we can use another method to implement the lock not in gravitino. Then we can put this work forward more fast.

We can use Gravitino to manage the Paimon and store the metadata of the database、table. And we may not need a REST catalog like Iceberg. We can just use Gravitino. That makes things more simple.

What do you think?

SteNicholas commented 8 months ago

@coolderli, the implementation of the lock is not designed in Gravitino. A Paimon REST catalog (better have) can facilitate users to use catalog through Rest method, which operation does not have conflict.

coolderli commented 8 months ago

@coolderli, the implementation of the lock is not designed in Gravitino. A Paimon REST catalog (better have) can facilitate users to use catalog through Rest method, which operation does not have conflict.

@SteNicholas Yeah, I know what you mean. But Gravitino already has its own Open API. We can use Gravitino Open API to do the same work. Of course, a Paimon REST catalog is meaningful, there is indeed no conflict between the two implementation methods. But using Gravitino Open API is more simple for now. We can finish this work more fast.

SteNicholas commented 8 months ago

@YxAc, @coolderli, I have updated the proposal of Paimon catalog support. PTAL.

coolderli commented 8 months ago

@SteNicholas Hi, any update about this? Thanks.

jerryshao commented 3 months ago

@caican00 can you please leave a message here, so I can assign the epic issue to you.

caican00 commented 3 months ago

@caican00 can you please leave a message here, so I can assign the epic issue to you.

@jerryshao sorry for the late. I have completed the db and table operations based on Paimon FilesystemCatalog.