apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.37k stars 1.2k forks source link

Add support to write DataFrames to an object store #3668

Open wperron opened 2 years ago

wperron commented 2 years ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do. Hey folks :wave: we're evaluating Datafusion/Ballista to perform queries as part of a pipeline on some pretty large datasets (we're talking on the order of a few Terabytes at minimum) and would like to read from Parquet files from GCS and write as Parquet to a different GCS bucket. Right now it's possible to get a DataFrame from GCS using the object_store crate, but it's not possible to write the resulting DataFrame back to GCS.

Additional context This feature was already discussed in #2185 but that issue is out of date because the new object_store crate has since been merged in.

stormasm commented 2 years ago

https://github.com/minio/minio

Minio would be a nice open source example of an object store to use for testing...

And one I have been using for Iox testing..