apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
13.87k stars 3.38k forks source link

[C++][Parquet] Read string columns directly into STRING_VIEW arrays and cast to LARGE_STRING_VIEW if necessary #43068

Open felipecrv opened 5 days ago

felipecrv commented 5 days ago

Describe the enhancement requested

This would fix two issues for the price of one:

  1. Reading from Parquet into schemas that use the new STRING_VIEW type
  2. Reading LARGE_STRING_ARRAY from Parquet (#39682)

This issue also depends on:

Component(s)

C++, Parquet

mapleFU commented 5 days ago

Related: https://github.com/apache/arrow-rs/issues/5530

This can also applying "zero-copy" here for non Delta string encoding