Support DELTA_BINARY_PACKED and DELTA_BYTE_ARRAY

apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator

https://datafusion.apache.org/comet

Apache License 2.0

653 stars 119 forks source link

Open kazuyukitanimura opened 2 weeks ago

kazuyukitanimura commented 2 weeks ago

There are some tests in Spark 4.0 that uses parquet.writer.version=v2 (ParquetTypeWideningSuite).

The V2 write writes with delta encoding. Comet currently cannot read such files

No response

No response

parthchandra commented 2 weeks ago

IIRC, the vectorized versions of these encodings in Spark did not improve performance much over the row based implementation in the parquet library