apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
13.88k stars 3.38k forks source link

[C++] Add write support for ORC in the Datasets API #29422

Open asfimport opened 2 years ago

asfimport commented 2 years ago

ARROW-13572 (https://github.com/apache/arrow/pull/10991) added basic support for ORC file format in the Datasets API, but didn't yet add support to write datasets to the ORC format.

Reporter: Joris Van den Bossche / @jorisvandenbossche

Note: This issue was originally created as ARROW-13796. Please see the migration documentation for further details.

asfimport commented 2 years ago

Ian Alexander Joiner / @iajoiner: @jorisvandenbossche  As we agreed back in Feb I will take this one haha.

asfimport commented 1 year ago

Todd Farmer / @toddfarmer: This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned per project policy. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.

ddrinka commented 9 months ago

@jorisvandenbossche The current failure mode due to the lack of this implementation is confusing. Calling OrcFileFormat.make_write_options() kills Python with no exception reported. Calling ds.write_dataset with format='orc' calls OrcFileFormat.make_write_options() and dies silently as well. An exception should be thrown that's visible to Python when writes to the 'orc' format are attempted.