Closed santosh-d3vpl3x closed 2 months ago
Yeah, seems like we could add this since https://github.com/datafusion-contrib/hdfs-native-object-store just makes hdfs look like an object_store
, thanks!
Not sure when I will have time to work on this, but if someone wants to make a PR I'd be open to it, and I will find time at some point.
@nicklan I would be interested in contributing to this, but I am completely new to this project.
Let me know if my understanding of the problem is correct, as I see it there are two ways to go about it:
Add hdfs support into the upstream object_store crate, but looking at their issue (https://github.com/apache/arrow-rs/issues/5638) they seem to prefer to keep hdfs separate.
Allow the current implementation to create object stores from object_store
or hdfs_native_object_store
crates
delta-rs
, you would match protocol schemes to the correct object store initializershdfs://
-> hdfs_native_object_store::HdfsObjectStore
and s3://
-> object_store::ObjectStore
2nd way would be the most straightforward and preferable I believe.
Sorry GitHub randomly decided to stop sending me email notifications. delta-rs
just uses a dyn ObjectStore
so it was fairly easy to integrate. Haven't looked much at this repo yet to see how it handles object store but hopefully is straightforward!
Ah it looks like object_store::parse_url_opts
is just used directly so might be a little more work to integrate, delta-rs
already has custom handling of schemes so it was a little more straightforward. Have to do some upfront parsing of the scheme before forwarding to parse_url_opts
@nicklan I've marked my PR as ready, can you take a look?
delta-rs recently got initial support for hdfs via this PR.
It would be great if we could do the same for delta-rs-kernel.
Duckdb recently introduced support for delta via kernel implementation but it can't be used with hdfs because of this missing integration.
Tagging @kimahriman to see if they can help out here!