apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.69k stars 3.56k forks source link

[Python] Add filesystem capabilities to other IO formats (feather, csv, json, ..) #25967

Open asfimport opened 4 years ago

asfimport commented 4 years ago

In the parquet IO functions, we support reading/writing files from non-local filesystems directly (in addition to passing a buffer) by:

Reporter: Joris Van den Bossche / @jorisvandenbossche Assignee: Miles Granger / @milesgranger

Note: This issue was originally created as ARROW-9938. Please see the migration documentation for further details.

asfimport commented 4 years ago

Antoine Pitrou / @pitrou: On the C++ side they will definitely stay more low-level. On the Python side, I have no preference. I suppose it could be useful to write open_csv("s3://...").

asfimport commented 4 years ago

Krisztian Szucs / @kszucs: Supporting remote URIs sounds like a nice feature.

asfimport commented 4 years ago

Neal Richardson / @nealrichardson: FTR I'm doing this in R in ARROW-9854, in case you want to see what this looks like in practice (https://github.com/apache/arrow/pull/8058)

asfimport commented 3 years ago

Hendrik Makait: Unless someone is already working on this, I'd love to get started on putting together a PR for this. Since it will be my first contribution, I might ask for guidance in the process. As a first question: Should I split this into multiple PRs per format (i.e. one PR for csv, feather, json, respectively) or combine them into one larger PR?

asfimport commented 3 years ago

Joris Van den Bossche / @jorisvandenbossche: Hi [~hendrik.makait] I don't think someone started on this, so a contribution would be very welcome! And happy to give some guidance where needed.

As a first question: Should I split this into multiple PRs per format (i.e. one PR for csv, feather, json, respectively) or combine them into one larger PR?

I would in any case start with a single format, and opening a PR for that, check if the approach is good, etc. Then we can still decide whether we want to add it for the other formats in the same PR or as separate PRs.

asfimport commented 2 years ago

Antoine Pitrou / @pitrou: cc @milesgranger, @AlenkaF. Nothing earth-shattering but perhaps a nice usability feature?