Closed RussellMcOrmond closed 6 years ago
Hi @RussellMcOrmond,
I believe there are a couple of different bugs in the ImageIO TIFF reader that are rearing their heads here. The reader is supposed to return a BufferedImage with a type property set to some useful value like TYPE_INT_RGB
, but in some cases it returns TYPE_CUSTOM
which is very inefficient to work with for whatever internal implementation reasons.
So there are two problems: the TIFF reader sets certain returned images to the wrong type, and that type is slow. There are bugs for these in the java tracker:
(Note that Cantaloupe isn't using the default/official ImageIO TIFF reader, but a fork that adds some other features, and the fork inherits this TYPE_CUSTOM
bug.)
I will create an SSCCE and an issue in the TIFF reader's repository. But I don't think there is much else I can do for now.
If it helps, JaiProcessor uses the same TIFF reader, but it doesn't use BufferedImages, so it can bypass the above issues.
Thank you @adolski for the JAI suggestion. The commit to our Docker configuration referenced above includes that change, and the images now come up quite quickly.
We are currently using OpenJpegProcessor for .jp2 , JaiProcessor for .tif, Java2dProcessor for .jpg files and PdfBoxProcessor for .pdf files, which covers all the extensions in our current repository we generate image derivatives for. I will continue to check different types and report anything interesting, but I believe the only images currently not working are the two CMYK JPEGs.
Hi @RussellMcOrmond,
All of our TIFF are also encoded with JPEG (about 25k overall, the biggest about 800M in size) and it was indeed quite slow with the biggest files, no matter with which processor. Those TIFF were pyramidal only and our solution was to use ImageMagick to add tiles to all of them. It did speed up the process a lot and solve our problem.
I hope it can help.
@FrancoisPaquette
Thank you for the note. In the short term we will use JAI, but I'm wondering if you can offer some advise for the future. We will be using Archivematica to create the images we use for access. Is there an ideal encoding for efficiency with Cantaloupe? Should I be generating tiled tif files rather than JPEG? Normally one of the benefits of JPEG2000 is the ability to generate tiles, but with using the external OpenJPEG filter I suspect that is not optimised with Cantalope.
We are currently not using a regular IIIF client when accessing the images, and our existing website asks for larger single derivatives from Cantaloupe (We only replaced our image server at this phase, not our page viewer). When we are using an IIIF client that requests tiles, we may end up needing to revisit the encoding question as performance will be an even larger issue.
No matter what kind of output images you are producing, the current recommendation for source image formats is either multi-resolution a.k.a. pyramidal tiled TIFF or JPEG2000. These will enable the expense of cropping and scaling steps (i.e. generating tiles) to remain closer to constant than any others as source image size increases. The tradeoff in a nutshell is space consumption (PTIFF) vs CPU consumption (JP2). Also, with JP2, you are limited by the decoder's performance and the fastest decoder (Kakadu) costs money. But, OpenJpegProcessor will indeed wring all it can out of OpenJPEG.
Thanks for the help! Will monitor for if the regular libraries get a fix to move away from JAI.
I believe it is appropriate for me to mark this issue resolved.
Regarding JPG2000, there is a concern with the browsers as it's not fully supported. But again, maybe it's not an issue for your users or in your environment.
https://caniuse.com/#search=jpeg2000
For your information, I got a quote from Kakadu and it was a few thousand dollars annually.
@FrancoisPaquette The request for information was about file formats as input to Cantaloupe, not that Cantaloupe would output. I expect most clients will request JPEG tiles from Cantaloupe, but that is up to the client to determine what they can support.
Thanks.
We have switched to using Cantaloupe from a custom image server we authored that was a PERL program that used the PerlMagic interface to GraphicsMagick.
Since migrating we noticed that access to some of our more 'unusual' files have become slow. The internal proposal so far is to author a tool to create access optimised images which Cantaloupe would use instead - a sort of pre-cache. In the future we will be using Archivematica https://github.com/artefactual/archivematica which can create optimised images for access, but we aren't ready to make that transition yet. As well as the performance issues, generating access optimised derivative files could also solve #177 for us.
I'm looking for advise of whether there is something else we can be looking into instead?
Most of our problems are with TIFF files. While we have a few uncompressed files which are one type of problem, the more odd ones use JPEG compression within a TIFF file format. At first we thought the problem was file size and IO speed, but comparably sized JPEG compressed images within a regular JFIF file render quickly.
https://github.com/c7a/cihm-cantaloupe/blob/master/cantaloupe.properties indicates that both processor.tif and processor.jpg are using the default Java2dProcessor, so there may be a specific setting we should be looking into that would speed up the JPEG compressed TIFF files.
Examples:
JPEG compressed TIFF - slow http://numeris.canadiana.ca/view/numeris.RF_1993_BY00_003_1_3/15 File sizes range from 185K to 1.2M
JPEG compressed JFIF - fast http://numeris.canadiana.ca/view/numeris.TV_1976_FA00_091/9 Has some files smaller, but most are comparable size.
Uncompressed TIFF - fast http://heritage.canadiana.ca/view/oocihm.lac_reel_t11146/310 Files are all above 6M (the page I'm pointing to), and all come up quickly.