apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.25k stars 3.46k forks source link

[C++] shared conversion framework for JSON/CSV parsers #21232

Open asfimport opened 5 years ago

asfimport commented 5 years ago

CSV and JSON both convert strings to values in a Array but there is little code sharing beyond arrow::util::StringConverter.

It would be advantageous if a single interface could be shared between CSV and JSON to do the heavy lifting of conversion consistently. This would simplify addition of new parsers as well as allowing all parsers to immediately take advantage of a new conversion strategy.

Reporter: Ben Kietzman / @bkietz

Note: This issue was originally created as ARROW-4706. Please see the migration documentation for further details.

asfimport commented 5 years ago

Antoine Pitrou / @pitrou: I wonder how much can be shared. Obviously the parsing will be different, but even the conversion layer will be different too. So perhaps the higher-level orchestration layer can be shared.

Note that ARROW-3410 will imply changing the CSV orchestration layer.