delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.23k stars 1.62k forks source link

[Feature Request] Delta Sync for metadata sync to HMS/Glue #1478

Closed agrawalreetika closed 5 months ago

agrawalreetika commented 1 year ago

Hudi has Hudi Sync which allows sync of table metadata from transaciton logs to HMS/Glue. I wanted to know if there is something similar for Delta Tables?

tdas commented 1 year ago

This will be a good feature to add. However, from my past experience, HMS interactions are often flaky and buggy. Do you know how well the Hudi sync works?

agrawalreetika commented 1 year ago

@tdas Thanks for your response. I have tried Hudi Sync with Glue, haven't seen any issue so far

agrawalreetika commented 1 year ago

Hi @tdas, Could you please help me understand what kind of flakiness and bugs you faced earlier with HMS sync? Is there any work done already on the sync of table metadata from transaction logs to HMS/Glue, which I could follow?

dennyglee commented 1 year ago

Out of curiosity, would Glue Crawler reading Delta tables work in this scenario, or would you need to go beyond that? Shameless plug of a recent session by @moomindami and myself on this topic btw https://www.youtube.com/watch?v=GrqjZoVokNQ

agrawalreetika commented 1 year ago

Sorry for the delay. Thanks, @danny for sharing the Video link. I checked the Glue crawler which is reading the metadata from transaction logs and updating it to glue. But it is creating symlink tables, looks like it is not configurable while configuring the crawler. And I do not find any specific properties in the metadata to identify if it is a Delta Table.

Please correct me if I am missing something here.

dennyglee commented 1 year ago

Oh, sorry, I had jumped too quickly ;-). Could you try AWS Glue 4.0 with Delta Lake 2.1?

agrawalreetika commented 1 year ago

@dennyglee I am using Glue Crawler for updating and maintaining metadata for the Delta table in the Glue catalog. As per the given document, it looks like it's for Glue jobs for data read/writte in Delta tables?

agrawalreetika commented 1 year ago

@tdas @dennyglee What should be the next steps to get this feature, as I didn't find any option to do metadata sync to glue/HMS?

dennyglee commented 1 year ago

@agrawalreetika Thanks for your patience - some quick questions:

agrawalreetika commented 1 year ago

Hi @dennyglee, Thanks for your response.

agrawalreetika commented 1 year ago

Hi @dennyglee, Just checking in do you need any other details from my side?

dhruvarya-db commented 7 months ago

Hi all, I have started working on this issue.

fuyun2024 commented 6 months ago

I am looking forward to the completion of this feature

vkorukanti commented 5 months ago

This is resolved in #2409