datafusion-contrib / datafusion-objectstore-hdfs

HDFS based on Java implementation as a remote ObjectStore for DataFusion
Apache License 2.0
9 stars 8 forks source link

Implement based on the new object store abstraction #5

Closed yahoNanJing closed 1 year ago

yahoNanJing commented 2 years ago

https://github.com/apache/arrow-datafusion/issues/2489

alamb commented 2 years ago

@yahoNanJing I may have some time to help with this project now that the object store has been incorporated into arrow https://github.com/apache/arrow-rs/issues/2030

My ulterior motive is that I would like to highlight the ability to plug in different implementations in the blog post I am writing about object_store and would like to use HDFS support in this crate as an example

Is anyone else planning to work on this ticket ?

hrh007 commented 2 years ago

Is there any progress in this work, I am very interested in this work, and I hope arrow-datafusion can use hdfs storage @alamb @yahoNanJing

alamb commented 2 years ago

I have not made any progress @hrh007 but object_store 0.4.0 has been released https://crates.io/crates/object_store/0.4.0

I would be happy to help if you wanted to start the work

hrh007 commented 2 years ago

@alamb Thanks for reply, but object store 0.4.0 donot support HDFS eitheršŸ¤£

alamb commented 2 years ago

@alamb Thanks for reply, but object store 0.4.0 donot support HDFS eitheršŸ¤£

Indeed -- but I think the interface changed a little so now that it is released it would be a good time to update the hdfs client

alamb commented 2 years ago

@dmetasoul01 mentioned on https://github.com/apache/arrow-datafusion/issues/3177#issuecomment-1220218640 that there is an implementation in blaze-rs

https://github.com/blaze-init/blaze/blob/master/native-engine/datafusion-ext/src/hdfs_object_store.rs

yahoNanJing commented 1 year ago

@hrh007, since hdfs object store depends on java environment, currently I don't put it into the object store crate.

And now the hdfs object store has already implemented the new interface of the object store and it's already been used by the Ballista. If you want the datafusion to use hdfs, you can refer to the Ballista for the usage.