apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
653 stars 119 forks source link

Support DELTA_BINARY_PACKED and DELTA_BYTE_ARRAY #574

Open kazuyukitanimura opened 2 weeks ago

kazuyukitanimura commented 2 weeks ago

What is the problem the feature request solves?

There are some tests in Spark 4.0 that uses parquet.writer.version=v2 (ParquetTypeWideningSuite).

The V2 write writes with delta encoding. Comet currently cannot read such files

Describe the potential solution

No response

Additional context

No response

parthchandra commented 2 weeks ago

IIRC, the vectorized versions of these encodings in Spark did not improve performance much over the row based implementation in the parquet library