Open asfimport opened 5 years ago
Wes McKinney / @wesm: As something to keep in mind, we will need to implement a "Sink" node type to be the flip side of "Scan" in a query engine context. To the user may wish to output the results of a query directly to CSV, JSON, Parquet or some other dataset format. So we need to develop a common API that this can hook into for this purpose
Nicola Crane / @thisisnic: User request on StackOverflow for this feature to be implemented: https://stackoverflow.com/questions/71047976/fast-ldjson-writing-with-arrow
Weston Pace / @westonpace: I came across an helpful Github issue today that explains that there are actually several standards for line delimited JSON and goes over a bit the differences. This might be a helpful reference when this gets implemented: https://github.com/ndjson/ndjson.github.io/issues/1
Todd Farmer / @toddfarmer: This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.
Steve M. Kim: As part of this feature request, do we contemplate generating a JSON Schema from a Arrow table schema? Given an Arrow schema and record batches, it would be useful to get a JSON schema and a sequence of JSON objects that conform to that schema. This would also facilitate testing the correctness of the Arrow JSON writer.
David Li / @lidavidm: That's a new can of worms :) There's been some discussion about a way to represent Arrow schemas in JSON. See https://github.com/apache/arrow/issues/13803 and https://github.com/apache/arrow/pull/7110 and ARROW-8952.
Users who need to emit json in line delimited format currently cannot do so using arrow. It should be straightforward to implement this efficiently, and it will be very helpful for testing and benchmarking
Reporter: Ben Kietzman / @bkietz
Related issues:
Note: This issue was originally created as ARROW-5033. Please see the migration documentation for further details.