Describe the bug, including details regarding any error messages, version, and platform.
I am facing an issue while splitting a parquet file into multiple files using the ParquetFileWriter.appendRowGroups API. It is failing to set the dictionary page offsets correctly in the new files. When investigated further, I observed that the API ParquetMetadataConverter.addRowGroup has an assumption on the availability of EncodingStats always. As per the format specification, it is not mandatory to have the encoding_stats. Is it possible to remove this requirement?
Describe the bug, including details regarding any error messages, version, and platform.
I am facing an issue while splitting a parquet file into multiple files using the ParquetFileWriter.appendRowGroups API. It is failing to set the dictionary page offsets correctly in the new files. When investigated further, I observed that the API ParquetMetadataConverter.addRowGroup has an assumption on the availability of EncodingStats always. As per the format specification, it is not mandatory to have the encoding_stats. Is it possible to remove this requirement?
https://github.com/apache/parquet-java/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L559
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L826
Component(s)
No response