dragon66 / icafe

Java library for reading, writing, converting and manipulating images and metadata
Eclipse Public License 1.0
204 stars 58 forks source link

java.lang.OutOfMemoryError processing large multi page PDF to TIFF #25

Closed reecefenwick closed 8 years ago

reecefenwick commented 8 years ago

I am attempting to convert a 10mb PDF (40 pages) to a multi-page TIFF

I have the following code:

I'm using org.apache.pdfbox to process the PDF.

    public void savePdfAsTiff(PDDocument pdf, OutputStream outputStream) throws IOException {
        BufferedImage[] images = new BufferedImage[pdf.getNumberOfPages()];
        for (int i = 0; i < images.length; i++) {
            PDPage page = (PDPage) pdf.getDocumentCatalog().getAllPages()
                    .get(i);
            BufferedImage image;
            try {
                image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 288); //works
                images[i] = image;
            } catch (IOException e) {
                e.printStackTrace();
                throw e;
            }
        }

        RandomAccessOutputStream rout = new MemoryCacheRandomAccessOutputStream(outputStream);

        ImageParam.ImageParamBuilder builder = ImageParam.getBuilder();
        ImageParam[] param = new ImageParam[1];
        TIFFOptions tiffOptions = new TIFFOptions();
        tiffOptions.setTiffCompression(TiffFieldEnum.Compression.CCITTFAX4);
        builder.imageOptions(tiffOptions);
        builder.colorType(ImageColorType.FULL_COLOR).ditherMatrix(DitherMatrix.getBayer8x8Diag()).applyDither(true).ditherMethod(DitherMethod.BAYER);
        param[0] = builder.build();

        TIFFTweaker.writeMultipageTIFF(rout, param, images);

        rout.close();
    }

This works quite well on smaller images.

But obviously buffering everything in memory will only get you so far, in my case I run out of heap space.

Have you got any examples to create the multipage tiff more efficiently?

dragon66 commented 8 years ago

@reecefenwick: yes, you could have a look at issue https://github.com/dragon66/icafe/issues/23 which for animated GIF but there is a similar function inside TIFFTweaker.

TIFFWriter writer = new TIFFWriter();
List<IFD> ifds = new ArrayList<IFD>();
FileOutputStream fout = new FileOutputStream("NEW.tif");
RandomAccessOutputStream rout = new FileCacheRandomAccessOutputStream(fout);
int writeOffset = TIFFTweaker.prepareForWrite(rout);
// Grab BufferedImage one by one and write them
BufferedImage bi = aBufferedImage;//
writeOffset = TIFFTweaker.writePage(bi, rout, ifds, writeOffset, writer);
// Keep writing until done, then finish up
TIFFTweaker.finishWrite(rout, ifds);
rout.close();

With this approach, you can set image parameters to TIFFWriter instance to control what kind of TIFF image you want to write - Full color, indexed, black and white etc.. You can even set different color types for different pages of the output multipage TIFF image.

reecefenwick commented 8 years ago

Thanks @dragon66 ! Solved my memory problems with your suggestion and it's very stable now