LibrePDF / OpenPDF

OpenPDF is a free Java library for creating and editing PDF files, with a LGPL and MPL open source license. OpenPDF is based on a fork of iText. We welcome contributions from other developers. Please feel free to submit pull-requests and bugreports to this GitHub repository.
Other
3.54k stars 584 forks source link

Merge pdf files. Merged files are not released and cannot be deleted #1112

Open lako12 opened 6 months ago

lako12 commented 6 months ago

I have a Java method that needs to read the PDF files inside a folder and create a file corresponding to the merge of all those inside the folder. Finally I have to delete the files and leave only the result in the folder. The problem is that once the final file is created, I can't delete all the other files used for the merge, as if it doesn't release them after use. I also tried to delete them manually from the file explorer, but I can't (as long as the application remains running). I'm using this version:(Java 11)

        <dependency>
            <groupId>com.github.librepdf</groupId>
            <artifactId>openpdf</artifactId>
            <version>1.4.1</version>
        </dependency>

My Code:

        File[] files = directory.listFiles();
        if (files != null && files.length > 0) {
            try (Document document = new Document()) {
                PdfCopy copy = new PdfCopy(document, new FileOutputStream(fileName));
                document.open();

                for (File file : files) {
                    try (PdfReader reader = new PdfReader(file.getAbsolutePath())) {
                        for (int i = 1; i <= reader.getNumberOfPages(); i++) {
                            copy.addPage(copy.getImportedPage(reader, i));
                        }
                        copy.freeReader(reader);
                    }
                }
            }
        }

if after this piece of code I try to delete all the files in the "files" list by calling the .delete() method, it doesn't work

I also tried removing the try wth resources and manually closing reader, document copy etc... I tried inserting the PdfCopy in the try with resources together with document, but nothing, I can't find a solution...

asturio commented 6 months ago

This really looks like the files are still open by the process. I assume you are also using Windows, as this OS is known for keeping resources locked.

Important is that the documents you want to delete are all closed.

You may check if the "closeStream" flag of the Objects are really "true". Maybe it is being set somewhere to "false".

Please check if the

currentPdfReaderInstance.getReader().close();
currentPdfReaderInstance.getReaderFile().close();

are really being executed in freeReader and/or getImportedPage. I think if they are called in freeReader, they shouldn't be called in getImportedPage

asturio commented 6 months ago

It seems, that the Files are only closed if the PdfReader.partial = true. Using the constructor PdfReader(String) don't set partial.

PdfReader.close() is still as from 2010.

I think tokens.close() should always be called. Any opinions on that?

lako12 commented 6 months ago

yes, i am usign widnows. this methods are called:

  currentPdfReaderInstance.getReader().close();
currentPdfReaderInstance.getReaderFile().close();

but they don't do anything because they don't respect the conditions

sa-sh commented 6 months ago

Hello guys, I want to share my experience with PdfReader. i noticed that issue happens only when it uses MappedRandomAccessFile (which is by default (Document.plainRandomAccess = false). loos like issue caused by FileChannel (used in MappedRandomAccessFile) which in java 11+ may be released/closed latter (by GC).

as workaround you can control how file opened for PdfReader i prefer to use RandomAccessFileOrArray directly, to control how file is loaded, small files better to load into memory

for (File file : files) {
   boolean forceRead = file.length() <= 10_000_000; // limit memory usage, set greater value if your app can use large heap
   boolean plainRandomAccess = true; // do not use MappedRandomAccessFile, Note: this flag ignored when forceRead=true
   try (RandomAccessFileOrArray raf = new RandomAccessFileOrArray(file.getPath(), forceRead, plainRandomAccess);
      PdfReader reader = new PdfReader(raf, null)) {
      //  do copy
  }
}
christianGen commented 3 months ago

Here is a discussion on how to force the liberation of mapped files:

https://stackoverflow.com/questions/2972986/how-to-unmap-a-file-from-memory-mapped-using-filechannel-in-java

TL;DR: pretty much impossible for newer jvms.

andreasrosdal commented 3 weeks ago

Pull requests to improve this is welcome. @rzam

rzam commented 3 weeks ago

@andreasrosdal Since memory mapping is "bugged" at the OS level I'm not sure there is a way other than forgoing memory mapping itself.

Though I was checking the code and I noticed that the Document.plainRandomAccess field mentioned in this thread is a public non-final field and that it seems to affect memory mapping throughout the whole library (font loading included), so I guess it could be used as a workaround when working on Windows machines? I'm not sure if it was intentional since it doesn't seem to be documented.

I've done some tests and a simple Document.plainRandomAccess = true; before the first rendering seems to fix the issue I was experiencing when loading fonts.