apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.42k stars 3.51k forks source link

[C++] Refactor arrow/io/hdfs.h to use common FileSystem API #22457

Open asfimport opened 5 years ago

asfimport commented 5 years ago

As part of this refactor, the FileSystem-related classes in https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/interfaces.h#L51 should be removed. The files should probably be moved also to arrow/filesystem

Reporter: Wes McKinney / @wesm

Related issues:

Note: This issue was originally created as ARROW-6055. Please see the migration documentation for further details.

asfimport commented 5 years ago

Ben Kietzman / @bkietz: @wesm should io::FileSystem be deprecated or just deleted?

asfimport commented 5 years ago

Ben Kietzman / @bkietz: In addition to removing io::FileSystem and io::FileStatistics, should HdfsPathInfo be replaced with fs::FileStats? It carries more information than fs::FileStats: last access time in addition to last modified time (though in seconds rather than ns since the epoch), block size, replication, and permissions. io-hdfs.pxi passes all of this for some_hdfs.ls(some_path, detail=True) but the docstring does not provide specifics about what metadata is guaranteed and test_hdfs.py doesn't even call ls(... detail=True)

asfimport commented 4 years ago

Wes McKinney / @wesm: where do things stand on this?

asfimport commented 3 years ago

Wes McKinney / @wesm: What is the latest here?

asfimport commented 3 years ago

Antoine Pitrou / @pitrou: The old filesystem interfaces are not exposed in arrow/io/interfaces.h anymore. The new HDFS implementation still calls into the old one. It would be a welcome cleanup job to reintegrate all the HDFS filesystem code into arrow/filesystem/hdfs.cc, but rather low-priority.