Closed mothukur closed 2 months ago
Thanks for reporting the issue! I think there is a similar effort to resolve this issue but it looks more complicated than it appears: https://github.com/apache/parquet-java/pull/1340
I've submitted a PR with the fix. Could you please review it?
Describe the bug, including details regarding any error messages, version, and platform.
I am facing an issue while splitting a parquet file into multiple files using the ParquetFileWriter.appendRowGroups API. It is failing to set the dictionary page offsets correctly in the new files. When investigated further, I observed that the API ParquetMetadataConverter.addRowGroup has an assumption on the availability of EncodingStats always. As per the format specification, it is not mandatory to have the encoding_stats. Is it possible to remove this requirement?
https://github.com/apache/parquet-java/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L559
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L826
Component(s)
No response