apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.6k stars 3.54k forks source link

[Python] Enable pyarrow AzureFileSystem for Windows #44655

Open davlee1972 opened 1 week ago

davlee1972 commented 1 week ago

Describe the enhancement requested

There are several resolved issues around enabling and packaging pyarrow.fs.AzureFileSystem for linux, but with the latest pyarrow 18.0 release there still isn't any packaging for windows..

https://github.com/apache/arrow/issues/44347

No AzureFileSystem in windows.

ImportError: The pyarrow installation is not built with support for 'AzureFileSystem'

C:\>python
Python 3.9.18 (main, Sep 11 2023, 14:09:26) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow as pa
>>> import pyarrow.fs as fs
>>> pa.__version__
'18.0.0'
>>> dir(fs)
['AwsDefaultS3RetryStrategy', 'AwsStandardS3RetryStrategy', 'FSSpecHandler', 'FileInfo', 'FileSelector', 'FileStats', 'FileSystem', 'FileSystemHandler', 'FileType', 'GcsFileSystem', 'HadoopFileSystem', 'LocalFileSystem', 'PyFileSystem', 'S3FileSystem', 'S3LogLevel', 'S3RetryStrategy', 'SubTreeFileSystem', '_MockFileSystem', '__builtins__', '__cached__', '__doc__', '__file__', '__getattr__', '__loader__', '__name__', '__package__', '__spec__', '_copy_files', '_copy_files_selector', '_ensure_filesystem', '_filesystem_from_str', '_is_path_like', '_not_imported', '_resolve_filesystem_and_path', '_stringify_path', 'atexit', 'copy_files', 'ensure_s3_finalized', 'ensure_s3_initialized', 'finalize_s3', 'initialize_s3', 'resolve_s3_region']
>>> hasattr(fs, "AzureFileSystem")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\xxxxx\Anaconda3\lib\site-packages\pyarrow\fs.py", line 75, in __getattr__
    raise ImportError(
ImportError: The pyarrow installation is not built with support for 'AzureFileSystem'

This works fine in linux:

(base) bash-4.2$ python
Python 3.9.19 (main, May  6 2024, 19:43:03)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow as pa
>>> import pyarrow.fs as fs
>>> pa.__version__
'18.0.0'
>>> dir(fs)
['AwsDefaultS3RetryStrategy', 'AwsStandardS3RetryStrategy', 'AzureFileSystem', 'FSSpecHandler', 'FileInfo', 'FileSelector', 'FileStats', 'FileSystem', 'FileSystemHandler', 'FileType', 'GcsFileSystem', 'HadoopFileSystem', 'LocalFileSystem', 'PyFileSystem', 'S3FileSystem', 'S3LogLevel', 'S3RetryStrategy', 'SubTreeFileSystem', '_MockFileSystem', '__builtins__', '__cached__', '__doc__', '__file__', '__getattr__', '__loader__', '__name__', '__package__', '__spec__', '_copy_files', '_copy_files_selector', '_ensure_filesystem', '_filesystem_from_str', '_is_path_like', '_not_imported', '_resolve_filesystem_and_path', '_stringify_path', 'atexit', 'copy_files', 'ensure_s3_finalized', 'ensure_s3_initialized', 'finalize_s3', 'initialize_s3', 'resolve_s3_region']
>>> hasattr(fs, "AzureFileSystem")
True

Component(s)

Packaging, Python

raulcd commented 4 days ago

This is the PR that enabled AzureFileSystem on macOS and Linux wheels, and there is a brief mention about the Windows failure. We should investigate and create a PR:

raulcd commented 4 days ago

It does seem that building Azure FS for Windows fails also for conda:

Probably requires fixing before trying to enable on the wheels.