Open dstucki opened 3 years ago
We are using Java's ImageIO class to read PNGs, and that seems to be our bottleneck here. I don't see a way to improve performance unless wither go back to writing our own codecs or finding a library that can read images (specifically pngs) faster.
Thanks for your analysis.
What are your thinking about bringing back com.lowagie.text.pdf.codec.PngImage and using it in the ImageLoader class just fo r loading PNG?
a) Can you check whether you use ImageIO.setUseCache(false) for in memory-based caching instead of disk-based caching?
b) ImageIO#read does a bunch of stuff under the hood. If you know, for instance, that your images are all PNGs then you could use ImageIO.getImageReadersByFormatName("PNG"); to get the reader once and then use that reader for all your images without having to search again.
Two other things (might not be related) also come to mind: c) What JDK are you using? If you just changed your JDK you might suffer from the switch from KCMS to Little-CMS.
d) Are you sure it is reading? Because up-to Java 8 there was a hardcoded maximum compression for writing PNGs files which resulted in quite slow performance. To drastically increase writing performance we use the backported PNGWriter.
I can say that I looked at using the PNG ImageReader and it did not offer a significant increase in performance.
Have anybody tried bringing back com.lowagie.text.pdf.codec.PngImage
and compare the performance directly with ImageIO
?
I updated my example repo to help answer some of your questions, 4 pngs read 10 times.
ImageIO.read with disabled cache: 3351ms com.lowagie.text.pdf.codec.PngImage.getImage: 65ms
a) Can you check whether you use ImageIO.setUseCache(false) for in memory-based caching instead of disk-based caching?
ImageIO.setUseCache(false) gives some improvements, but far away from the original itext implementation
b) ImageIO#read does a bunch of stuff under the hood. If you know, for instance, that your images are all PNGs then you could use ImageIO.getImageReadersByFormatName("PNG"); to get the reader once and then use that reader for all your images without having to search again.
Didn't try that yet
Two other things (might not be related) also come to mind: c) What JDK are you using? If you just changed your JDK you might suffer from the switch from KCMS to Little-CMS.
I did notice the performance drop using the same jdk
d) Are you sure it is reading? Because up-to Java 8 there was a hardcoded maximum compression for writing PNGs files which resulted in quite slow performance. To drastically increase writing performance we use the backported PNGWriter.
Yes
ImageIO.read with disabled cache: 3351ms com.lowagie.text.pdf.codec.PngImage.getImage: 65ms
Interesting - didn't realize that the PNGImage was thrown out. Anybody know the reason?
Issue #89 with Pull Request #90 threw PngImage out.
Maybe @andreasrosdal can share some additional info?
I believe I read somewhere that we wanted to use standard libs instead of maintaining our own codecs to read images, will try to find the source. EDIT: Found this in com.lowagie.text.ImageLoader Is there a difference in file size between the two?
Yes, that was the case. Instead of reinventing the wheel every time, it is better to use libraries to make non-PDF stuff for us. Maybe ImageIO can be used in a more optimized way.
The idea was to use a standard library to handle images, instead of maintaining image codecs in OpenPDF. This has several advantages related to security, maintainability, supporting the PNG format much better, and was also a part of the license review process of the OpenPDF source code. The iText people were very license-militaristic-hostile-evil at one point in the long distant past, so using libraries for things like image codecs was a defense against that.
Pull requests to improve this is welcome. ImageIO.setUseCache() could improve performance. Maybe you'll just have to deal with it? :) Could another image manipulation library be used as an optional dependency?
I made some performance measurement, and loading PNG (~40ms) is about 4 times "slower" than loading a GIF (~10ms). And GIF loading is infinitely slower than loading a JPG.
Load GIF ~time after 1 iterations 39 ms Load JPG ~time after 1 iterations 1 ms Load PNG ~time after 1 iterations 104 ms
Load GIF ~time after 10 iterations 16 ms Load JPG ~time after 10 iterations 0 ms Load PNG ~time after 10 iterations 54 ms
Load GIF ~time after 100 iterations 12 ms Load JPG ~time after 100 iterations 0 ms Load PNG ~time after 100 iterations 38 ms
Load GIF ~time after 1000 iterations 10 ms Load JPG ~time after 1000 iterations 0 ms Load PNG ~time after 1000 iterations 38 ms
If you know any other library, for faster PNG-Processing, let us know.
https://github.com/leonbloy/pngj Tried it once but wasn't maintained and the approach was somehow different. Just found https://github.com/nayuki/PNG-library but do not know this one...
"The reason that PNG is so fast with iText is that there's an optimization to directly copy the compressed image data from the PNG file into the PDF document when some conditions are met (grayscale color type, etc). That's much faster than decoding the PNG image data and recompressing it as a PDF image,"
https://github.com/TIBCOSoftware/jasperreports/issues/362
So we just need to copy directly the compressed image data directly into the PDF file.
As stated earlier, custom image codecs were removed from OpenPDF as part of the source code licence review we did, and to ease maintenance job by using libraries to read images.
Describe the bug Performance for handling PNG's (e.g. inserting into PDF Document) has become noticeable slow from original itext implementation to latest openpdf implementation.
To Reproduce Example Project see https://github.com/dstucki/openpdf-png-performance
OR
Expected behavior Performance of original itext code should be more or less restored
System (please complete the following information):