dragon66 / icafe

Java library for reading, writing, converting and manipulating images and metadata
Eclipse Public License 1.0
203 stars 58 forks source link

OutOfMemoryError when reading in large tiff and writing it out as PDF #91

Closed lqlau closed 3 years ago

lqlau commented 4 years ago

This is my code:

public static void generatePdfFromTiff(File file) throws Exception { FileInputStream fin = new FileInputStream(file.getAbsoluteFile()); PDDocument doc = new PDDocument();

    PageReader reader = new PageReader();
    BufferedImage pageBuffer = reader.getNextPage(fin);
    while ( pageBuffer != null ) {
        PDPage page = new PDPage();
        doc.addPage(page);
        PDPageContentStream contentStream = new PDPageContentStream(doc, page);
        try {
            PDImageXObject image = LosslessFactory.createFromImage(doc, pageBuffer);
            Dimension scaledDim = getScaledDimension(new Dimension(image.getWidth(), image.getHeight()),
                    new Dimension((int) page.getMediaBox().getWidth(), (int) page.getMediaBox().getHeight()));
            contentStream.drawImage(image, 1, 1, scaledDim.width, scaledDim.height);
        } finally {
            contentStream.close();
        }
        pageBuffer = reader.getNextPage(fin);
    }

    String[] tokens = file.getName().split("\\.(?=[^\\.]+$)");
    File targetFile = new File(file.getParent(), tokens[0] + ".pdf");
    targetFile.delete();
    doc.save(targetFile);
    doc.close();
}

This is my error:

java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Unknown Source) at java.io.ByteArrayOutputStream.toByteArray(Unknown Source) at org.apache.pdfbox.pdmodel.graphics.image.LosslessFactory.createFromGrayImage(LosslessFactory.java:161) at org.apache.pdfbox.pdmodel.graphics.image.LosslessFactory.createFromImage(LosslessFactory.java:84)

Also, any ways to make the conversion from tiff to pdf faster?

dragon66 commented 4 years ago

@lqlau I saw you already read tiff page by page and the OutOfMemoryError coming from the creation of the PDF by PDFBox itself not from ICAFE.

I have no idea how big the original tiff is and whether or not there is a way to assemble pages of PDF to one afterwards with PDFBox (I bet it should be). If that is possible, you can save the tiff to PDF page by page first then concate them together; or you can split the tiff into single page Tiffs first, then create PDF from them.

Don't know which way will work or not.

You can simply increase a bit of heap size and hopefully it will give you enough room to finished the job.

As for speeding up the conversion speed, I don't know which process is your bottle neck without profiling.

lqlau commented 4 years ago

@lqlau I saw you already read tiff page by page and the OutOfMemoryError coming from the creation of the PDF by PDFBox itself not from ICAFE.

I have no idea how big the original tiff is and whether or not there is a way to assemble pages of PDF to one afterwards with PDFBox (I bet it should be). If that is possible, you can save the tiff to PDF page by page first then concate them together; or you can split the tiff into single page Tiffs first, then create PDF from them.

Don't know which way will work or not.

You can simply increase a bit of heap size and hopefully it will give you enough room to finished the job.

As for speeding up the conversion speed, I don't know which process is your bottle neck without profiling.

Thanks @dragon66! I will try to split a single page into smaller chunks/tile by calling getSubImage() and then stitch them together piece by piece by calling drawImage() to get a whole page. I'll let you know if that works.