apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.65k stars 1.41k forks source link

File Lock Issue in ParquetRewriter class in Apache Parquet Hadoop 1.14.1 on Windows 11 22H2 #3000

Closed cetindogu closed 1 month ago

cetindogu commented 2 months ago

Describe the bug, including details regarding any error messages, version, and platform.

java 21, Apache Parquet Hadoop 1.14.1 windows 11

after successfully pass the try block, my inputs(.parquet files) are still used by another process. so i can't delete the inputs (the parquet files)

java.nio.file.FileSystemException: test\sample\dayhour-150-0-6a57c328-8610-46f8-bf7e-311f8605def0.parquet: This process cannot access the file because the file is being used by another process

    RewriteOptions rewriteOptions = new RewriteOptions.Builder(confForLocal, paths, outputPath).build();

    try (ParquetRewriter rewriter = new ParquetRewriter(rewriteOptions)) {
        rewriter.processBlocks();
    } catch (IOException e) {
        logger.error("Cannot merge parquet files", e);
        return Optional.empty();
    }

    if (config.isDeleteOriginalParquetsAfterMerge()) {
        deleteMergedParquets(inputs);
    }

 private void deleteMergedParquets(List<String> inputs) {
    for (String input : inputs) {
        try {
            Files.delete(Paths.get(input));             
            logger.info("Deleting {} file since this file is merged with another one", input);
        } catch (IOException e) {
            logger.error("Cannot remove files for {}", input, e);
        }
    }
}

Component(s)

No response

cetindogu commented 2 months ago

in ParquetReWriter at line 256 there is reader = inputFiles.poll();

this line must be reader.close(); reader = inputFiles.poll();

cetindogu commented 2 months ago

is there anyone to check my pull request ? https://github.com/apache/parquet-java/pull/3002

image

cetindogu commented 2 months ago

also null checked image

cetindogu commented 2 months ago

i moved "reader.close();" command to one line up (in same line with "if" clouse)

https://github.com/apache/parquet-java/pull/3002

cetindogu commented 2 months ago

https://github.com/apache/parquet-java/pull/3002

mvc spotless:apply is run.

cetindogu commented 1 month ago

bug is fixed with pull request https://github.com/apache/parquet-java/pull/3002 so next release version should be waited.