LibrePDF / OpenPDF

OpenPDF is a free Java library for creating and editing PDF files, with a LGPL and MPL open source license. OpenPDF is based on a fork of iText. We welcome contributions from other developers. Please feel free to submit pull-requests and bugreports to this GitHub repository.
Other
3.63k stars 598 forks source link

Image.getInstance on PNG performs worse compared to original code #596

Open dstucki opened 3 years ago

dstucki commented 3 years ago

Describe the bug Performance for handling PNG's (e.g. inserting into PDF Document) has become noticeable slow from original itext implementation to latest openpdf implementation.

To Reproduce Example Project see https://github.com/dstucki/openpdf-png-performance

OR

Image.getInstance( URL ); // with a PNG

Expected behavior Performance of original itext code should be more or less restored

System (please complete the following information):

ApionXD commented 3 years ago

We are using Java's ImageIO class to read PNGs, and that seems to be our bottleneck here. I don't see a way to improve performance unless wither go back to writing our own codecs or finding a library that can read images (specifically pngs) faster. image

dstucki commented 3 years ago

Thanks for your analysis.

What are your thinking about bringing back com.lowagie.text.pdf.codec.PngImage and using it in the ImageLoader class just fo r loading PNG?

Lonzak commented 3 years ago

a) Can you check whether you use ImageIO.setUseCache(false) for in memory-based caching instead of disk-based caching?

b) ImageIO#read does a bunch of stuff under the hood. If you know, for instance, that your images are all PNGs then you could use ImageIO.getImageReadersByFormatName("PNG"); to get the reader once and then use that reader for all your images without having to search again.

Two other things (might not be related) also come to mind: c) What JDK are you using? If you just changed your JDK you might suffer from the switch from KCMS to Little-CMS.

d) Are you sure it is reading? Because up-to Java 8 there was a hardcoded maximum compression for writing PNGs files which resulted in quite slow performance. To drastically increase writing performance we use the backported PNGWriter.

ApionXD commented 3 years ago

I can say that I looked at using the PNG ImageReader and it did not offer a significant increase in performance.

asturio commented 3 years ago

Have anybody tried bringing back com.lowagie.text.pdf.codec.PngImage and compare the performance directly with ImageIO?

dstucki commented 3 years ago

I updated my example repo to help answer some of your questions, 4 pngs read 10 times.

ImageIO.read with disabled cache: 3351ms com.lowagie.text.pdf.codec.PngImage.getImage: 65ms

a) Can you check whether you use ImageIO.setUseCache(false) for in memory-based caching instead of disk-based caching?

ImageIO.setUseCache(false) gives some improvements, but far away from the original itext implementation

b) ImageIO#read does a bunch of stuff under the hood. If you know, for instance, that your images are all PNGs then you could use ImageIO.getImageReadersByFormatName("PNG"); to get the reader once and then use that reader for all your images without having to search again.

Didn't try that yet

Two other things (might not be related) also come to mind: c) What JDK are you using? If you just changed your JDK you might suffer from the switch from KCMS to Little-CMS.

I did notice the performance drop using the same jdk

d) Are you sure it is reading? Because up-to Java 8 there was a hardcoded maximum compression for writing PNGs files which resulted in quite slow performance. To drastically increase writing performance we use the backported PNGWriter.

Yes

Lonzak commented 3 years ago

ImageIO.read with disabled cache: 3351ms com.lowagie.text.pdf.codec.PngImage.getImage: 65ms

Interesting - didn't realize that the PNGImage was thrown out. Anybody know the reason?

dstucki commented 3 years ago

Issue #89 with Pull Request #90 threw PngImage out.

Maybe @andreasrosdal can share some additional info?

ApionXD commented 3 years ago

I believe I read somewhere that we wanted to use standard libs instead of maintaining our own codecs to read images, will try to find the source. EDIT: Found this in com.lowagie.text.ImageLoader image Is there a difference in file size between the two?

asturio commented 3 years ago

Yes, that was the case. Instead of reinventing the wheel every time, it is better to use libraries to make non-PDF stuff for us. Maybe ImageIO can be used in a more optimized way.

andreasrosdal commented 2 years ago

The idea was to use a standard library to handle images, instead of maintaining image codecs in OpenPDF. This has several advantages related to security, maintainability, supporting the PNG format much better, and was also a part of the license review process of the OpenPDF source code. The iText people were very license-militaristic-hostile-evil at one point in the long distant past, so using libraries for things like image codecs was a defense against that.

Pull requests to improve this is welcome. ImageIO.setUseCache() could improve performance. Maybe you'll just have to deal with it? :) Could another image manipulation library be used as an optional dependency?

asturio commented 1 year ago

I made some performance measurement, and loading PNG (~40ms) is about 4 times "slower" than loading a GIF (~10ms). And GIF loading is infinitely slower than loading a JPG.

Load GIF ~time after 1 iterations 39 ms Load JPG ~time after 1 iterations 1 ms Load PNG ~time after 1 iterations 104 ms


Load GIF ~time after 10 iterations 16 ms Load JPG ~time after 10 iterations 0 ms Load PNG ~time after 10 iterations 54 ms


Load GIF ~time after 100 iterations 12 ms Load JPG ~time after 100 iterations 0 ms Load PNG ~time after 100 iterations 38 ms


Load GIF ~time after 1000 iterations 10 ms Load JPG ~time after 1000 iterations 0 ms Load PNG ~time after 1000 iterations 38 ms


If you know any other library, for faster PNG-Processing, let us know.

Lonzak commented 1 year ago

https://github.com/leonbloy/pngj Tried it once but wasn't maintained and the approach was somehow different. Just found https://github.com/nayuki/PNG-library but do not know this one...

andreasrosdal commented 1 year ago

"The reason that PNG is so fast with iText is that there's an optimization to directly copy the compressed image data from the PNG file into the PDF document when some conditions are met (grayscale color type, etc). That's much faster than decoding the PNG image data and recompressing it as a PDF image,"

https://github.com/TIBCOSoftware/jasperreports/issues/362

So we just need to copy directly the compressed image data directly into the PDF file.

As stated earlier, custom image codecs were removed from OpenPDF as part of the source code licence review we did, and to ease maintenance job by using libraries to read images.