delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
1.97k stars 365 forks source link

feat(rust, python): add HDFS support via hdfs-native package #2612

Closed Kimahriman closed 1 week ago

Kimahriman commented 1 week ago

Description

Add support for HDFS using hdfs-native, a pure* Rust client for interacting with HDFS. Creates a new hdfs sub-crate, adds it as a feature to deltalake meta crate, and includes it in Python wheels by default. There is a Rust integration test that requires Hadoop and Java to be installed, and makes use of a small Maven program I ship under the integration-test feature flag to run a MiniDFS server.

*Dynamically loads libgssapi_krb5 using libloading for Kerberos support

Related Issue(s)

Resolves #2611

Documentation

ion-elgreco commented 1 week ago

@Kimahriman LGTM! Can you please also add some docs on the integration in the .MD files and if possible links some docs of the possible configs that can be set

@rtyler can you also go over it?

Kimahriman commented 1 week ago

@Kimahriman LGTM! Can you please also add some docs on the integration in the .MD files and if possible links some docs of the possible configs that can be set

Yeah I agree there should be some, but I didn't see anything for other storage backends so I wasn't sure where to add it. Any recommendation? A new page under integrations/?

ion-elgreco commented 1 week ago

@Kimahriman LGTM! Can you please also add some docs on the integration in the .MD files and if possible links some docs of the possible configs that can be set

Yeah I agree there should be some, but I didn't see anything for other storage backends so I wasn't sure where to add it. Any recommendation? A new page under integrations/?

Yeah under integrations makes sense, perhaps called object storage and then a page for hdfs there

@avriiil any inputs on this? Having some small explanation per object store for S3, adls, gcs and mounted storage would make sense, do you want to help on this?

Kimahriman commented 1 week ago

Yeah under integrations makes sense, perhaps called object storage and then a page for hdfs there

I added a page. I couldn't get the docs to build locally to verify things though. Just kept getting

griffe.exceptions.AliasResolutionError: Could not resolve alias deltalake._internal.DeltaError pointing at _internal.DeltaError (in python/deltalake/_internal.abi3.so:None)
ion-elgreco commented 1 week ago

Yeah under integrations makes sense, perhaps called object storage and then a page for hdfs there

I added a page. I couldn't get the docs to build locally to verify things though. Just kept getting

griffe.exceptions.AliasResolutionError: Could not resolve alias deltalake._internal.DeltaError pointing at _internal.DeltaError (in python/deltalake/_internal.abi3.so:None)

Yeah there is something broken with the docs for some time now

avriiil commented 4 days ago

Late to the party, I was OOO for a couple of days.

Happy to take a look at creating some more docs if that still makes sense @ion-elgreco?

~ ~ ~ Avril Aysha DevRel & Content +351 963681498

On Fri, 21 Jun 2024 at 07:22, Ion Koutsouris @.***> wrote:

Merged #2612 https://github.com/delta-io/delta-rs/pull/2612 into main.

— Reply to this email directly, view it on GitHub https://github.com/delta-io/delta-rs/pull/2612#event-13238027097, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQLWMSQ4OSQHETPDAUM7PSLZIPBA7AVCNFSM6AAAAABJSWGJNKVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJTGIZTQMBSG4YDSNY . You are receiving this because you were mentioned.Message ID: @.***>

ion-elgreco commented 4 days ago

@avriiil yeah some docs for each object store would be great 😃

avriiil commented 2 days ago

sounds good, adding this to my list for next week @ion-elgreco