apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
13.89k stars 3.38k forks source link

[R] Expose Azure Blob Storage filesystem #32123

Open asfimport opened 2 years ago

asfimport commented 2 years ago

I'd like to see the R arrow package be able to interface with the Azure Blob Storage file system from the AzureStor package.

 

In python, pyarrow and adlfs work together so I'd like for AzureStor and arrow under R to also work together.

Reporter: Dean MacGregor

Related issues:

Note: This issue was originally created as ARROW-16791. Please see the migration documentation for further details.

asfimport commented 2 years ago

Will Jones / @wjones127: PyArrow and adlfs work because we implement a compatibility layer between Python fsspec filesystems and PyArrow filesystems.

IIUC you are suggesting we could implement some similar integration in R? Does there exist a standard protocol analogous to fsspec?

asfimport commented 1 year ago

Dean MacGregor: I don't think so but I don't really know enough about how pyarrow or r arrow works behind the scenes to answer your question.

 

I just know azure blob works in pyarrow, that R arrow works with S3 (or at least the documentation is there) and I'm assuming that the difference between s3 and azure isn't too big. 

 

 

asfimport commented 1 year ago

Dean MacGregor: I found that in arrow/r/src/filesystem.cpp, on lines 38-40 there's a comment block which says to uncomment them for AzureBlobFileSystem when R6 classes are made.  However, when looking further, it seems that there is more in the cpp file for s3 and GCS than just that reference but nothing else for Azure so I guess that comment should be more of a TODO since, unless I'm missing a lot (which is all too possible) there's much more needed for Azure to work than just an R6 class, no?

 

To that end, Microsoft has https://github.com/Azure/azure-sdk-for-cpp for using azure in cpp.

 

I don't know cpp enough (really, at all) to be able to extend MS's code into something usable for arrow but just wanted to put the link in here in case it's helpful.

asfimport commented 1 year ago

Neal Richardson / @nealrichardson: Thanks, yeah, that's being added to the C++ library in ARROW-2034. Once that is merged, we can add R bindings.

drdrjacobs commented 1 month ago

Now that https://github.com/apache/arrow/issues/18014 is closed, can this be implemented?