Eventual-Inc / Daft

Distributed DataFrame for Python designed for the cloud, powered by Rust
https://getdaft.io
Apache License 2.0
1.82k stars 113 forks source link

Inquiry Regarding Point Lookup Support in Daft Python Library for Hudi Tables #2254

Closed soumilshah1995 closed 1 month ago

soumilshah1995 commented 1 month ago

Hello,

I hope this message finds you well. I wanted to reach out to inquire about the compatibility of Hudi Tables with Record Level Index (RLI) in conjunction with the Daft Python library. Specifically, I'm interested in knowing whether Daft supports point lookup functionalities for Hudi Tables with RLI.

I came across an informative blog post on LinkedIn detailing the enhanced performance benefits, particularly a 70% increase in speed, delivered by record-level indexing in Apache Hudi. The link to the blog post is provided below for reference:

LinkedIn Blog Post

Additionally, an image included in the post illustrating the concept:

image

Record Level Indexing

The purpose of this inquiry is to ascertain whether Daft currently supports point lookup for Hudi Tables with RLI. If not, I would like to explore the possibility of opening feature request tickets to address this functionality gap.

Looking forward to your insights on this matter.

jaychia commented 1 month ago

This currently depends on whether the underlying Hudi engine (PyHudi) supports it! Once it does though, I believe it should be easy to integrate with Daft.

cc @xushiyan

xushiyan commented 1 month ago

The current implementation does not integrate with metadata table so RLI as part of the MT is not applicable here.

soumilshah1995 commented 1 month ago

@xushiyan in future do we plan to release a support ? should we keep this thread open or close if we do plan to release this feature in future ?

xushiyan commented 1 month ago

We do have plan to get metadata table integrated and i'm tracking internally. We may close this one.

soumilshah1995 commented 1 month ago

Thanks