Apparently parquetFileWriter is closed twice here, first time by
parquetFileWriter.end(finalMetadata), which eventually calls parquetFileWriter.close()
second time by
AutoCloseables.uncheckedClose(columnStore, pageStore, bloomFilterWriteStore, parquetFileWriter);
This causes the underlying PositionOutputStream in ParquetFileWriter to be flushed again after it's closed, which may raise exception depending on the underlying stream implementation.
Caused by: org.apache.parquet.util.AutoCloseables$ParquetCloseResourceException: Unable to close resource
at org.apache.parquet.util.AutoCloseables.uncheckedClose(AutoCloseables.java:85)
at org.apache.parquet.util.AutoCloseables.uncheckedClose(AutoCloseables.java:94)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:144)
at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:437)
... 70 more
Caused by: java.io.IOException: stream is already closed
(-------- specific stream implementation ----------------)
at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.parquet.hadoop.util.HadoopPositionOutputStream.flush(HadoopPositionOutputStream.java:59)
at org.apache.parquet.hadoop.ParquetFileWriter.close(ParquetFileWriter.java:1659)
at org.apache.parquet.util.AutoCloseables.close(AutoCloseables.java:49)
at org.apache.parquet.util.AutoCloseables.uncheckedClose(AutoCloseables.java:83)
This issue is observed since 1.14.0, and I suspect PARQUET-2496 is caused by this similar issue.
ParquetWriter.close() invokes InternalParquetRecordWriter.close() with following logic:
Apparently parquetFileWriter is closed twice here, first time by parquetFileWriter.end(finalMetadata), which eventually calls parquetFileWriter.close()
second time by AutoCloseables.uncheckedClose(columnStore, pageStore, bloomFilterWriteStore, parquetFileWriter);
This causes the underlying PositionOutputStream in ParquetFileWriter to be flushed again after it's closed, which may raise exception depending on the underlying stream implementation.
sample exception:
Caused by: org.apache.parquet.util.AutoCloseables$ParquetCloseResourceException: Unable to close resource at org.apache.parquet.util.AutoCloseables.uncheckedClose(AutoCloseables.java:85) at org.apache.parquet.util.AutoCloseables.uncheckedClose(AutoCloseables.java:94) at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:144) at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:437) ... 70 more Caused by: java.io.IOException: stream is already closed (-------- specific stream implementation ----------------) at java.io.FilterOutputStream.flush(FilterOutputStream.java:140) at java.io.DataOutputStream.flush(DataOutputStream.java:123) at org.apache.parquet.hadoop.util.HadoopPositionOutputStream.flush(HadoopPositionOutputStream.java:59) at org.apache.parquet.hadoop.ParquetFileWriter.close(ParquetFileWriter.java:1659) at org.apache.parquet.util.AutoCloseables.close(AutoCloseables.java:49) at org.apache.parquet.util.AutoCloseables.uncheckedClose(AutoCloseables.java:83)
This issue is observed since 1.14.0, and I suspect PARQUET-2496 is caused by this similar issue.