apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.08k stars 390 forks source link

[VL] Support customize options for parquet native write #5751

Open gaoyangxiaozhu opened 2 months ago

gaoyangxiaozhu commented 2 months ago

Description

Currently, parquet native write doesn't support customized option as block_size, page_size etc. when write data.

There is a onging PR in velox side https://github.com/facebookincubator/velox/pull/8864/files#diff-5a2dd3766d9a74bbef58d62d96f0abfb111e8e507ce9bcecd35f69d2c8669ed7 to let support pass flushPolicy which support customize block_size, but not page_size.

Let use this issue to track for supporting customize all parquet options when write data.

gaoyangxiaozhu commented 2 months ago

@JkSelf let use this issue to track the customized block_size, page_size not support problem.

FelixYBW commented 2 months ago

@gaoyangxiaozhu can you list all the parquet write parameters Spark supports and velox/arrow supports? Let's pass all supported params to Velox