apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.64k stars 1.41k forks source link

Is it possible to apply specific encodings on specific columns with ParquetWriter? #3051

Open Selfeer opened 1 week ago

Selfeer commented 1 week ago

I’m working on a tool that generates Parquet files based on a file definition provided in JSON. I use the parquet-java library for this, and I’m curious if it’s possible to specify a particular type of encoding for specific columns when generating the file.

wgtmac commented 2 days ago

It seems that we can only control dictionary encoding and byte stream split encoding via ParquetProperties: https://github.com/apache/parquet-java/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java.

Other encoding types are enabled via WriterVersion: https://github.com/apache/parquet-java/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/factory/DefaultValuesWriterFactory.java