apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.48k stars 1.37k forks source link

Double close of ParquetFileWriter in ParquetWriter #2935

Open hellishfire opened 4 days ago

hellishfire commented 4 days ago

ParquetWriter.close() invokes InternalParquetRecordWriter.close() with following logic:

  public void close() throws IOException, InterruptedException {
    if (!closed) {
      try {
        if (aborted) {
          return;
        }
        flushRowGroupToStore();
        FinalizedWriteContext finalWriteContext = writeSupport.finalizeWrite();
        Map<String, String> finalMetadata = new HashMap<String, String>(extraMetaData);
        String modelName = writeSupport.getName();
        if (modelName != null) {
          finalMetadata.put(ParquetWriter.OBJECT_MODEL_NAME_PROP, modelName);
        }
        finalMetadata.putAll(finalWriteContext.getExtraMetaData());
        parquetFileWriter.end(finalMetadata);
      } finally {
        AutoCloseables.uncheckedClose(columnStore, pageStore, bloomFilterWriteStore, parquetFileWriter);
        closed = true;
      }
    }
  }

Apparently parquetFileWriter is closed twice here, first time by parquetFileWriter.end(finalMetadata), which eventually calls parquetFileWriter.close()

second time by AutoCloseables.uncheckedClose(columnStore, pageStore, bloomFilterWriteStore, parquetFileWriter);

This causes the underlying PositionOutputStream in ParquetFileWriter to be flushed again after it's closed, which may raise exception depending on the underlying stream implementation.

  public void close() throws IOException {
    try (PositionOutputStream temp = out) {
      temp.flush();
      if (crcAllocator != null) {
        crcAllocator.close();
      }
    }
  }

sample exception:

Caused by: org.apache.parquet.util.AutoCloseables$ParquetCloseResourceException: Unable to close resource at org.apache.parquet.util.AutoCloseables.uncheckedClose(AutoCloseables.java:85) at org.apache.parquet.util.AutoCloseables.uncheckedClose(AutoCloseables.java:94) at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:144) at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:437) ... 70 more Caused by: java.io.IOException: stream is already closed (-------- specific stream implementation ----------------) at java.io.FilterOutputStream.flush(FilterOutputStream.java:140) at java.io.DataOutputStream.flush(DataOutputStream.java:123) at org.apache.parquet.hadoop.util.HadoopPositionOutputStream.flush(HadoopPositionOutputStream.java:59) at org.apache.parquet.hadoop.ParquetFileWriter.close(ParquetFileWriter.java:1659) at org.apache.parquet.util.AutoCloseables.close(AutoCloseables.java:49) at org.apache.parquet.util.AutoCloseables.uncheckedClose(AutoCloseables.java:83)

This issue is observed since 1.14.0, and I suspect PARQUET-2496 is caused by this similar issue.