TAMULib / cantaloupe

High-performance dynamic IIIF image server in Java
https://medusa-project.github.io/cantaloupe/
Other
0 stars 0 forks source link

Optimize Processor Strategy, Caching, and Tuning #2

Open kaladay opened 1 week ago

kaladay commented 1 week ago

Performance in Cantaloupe is influenced by many factors including the processor strategy and caching setup. In Cantaloupe, processors read images from sources, decode them, transform them according to request arguments, and encode and write derivative images back to the client. Processors can be selected in different ways including on a request by request basis. Processors rely on different underlying processing engines which may have a direct effect on quality and performance.

While we're looking at performance, I think it might be a good time to review processor strategy and caching setup. I don't think it's worth time comparing processors and formats, but it would be good to know if our setup could be causing issues.

Because we have a tendency to use JPGs in more recent collections:

We also serve TIFs, JPEG2000s, and PDFs for some collections. Looking at similar settings for these is also probably a good idea. Specifically, are we using LZW on TIFFs or no compression at all?

Looking at stream and retrieval strategy is worth it. Optimal setup here is really dependent on what we're doing, but DownloadStrategy for either option would be unideal. At my previous institution, we primarily relied on the CacheStrategy and FilesystemCache but I think more modern strategies are StreamStrategy with CacheStrategy as a fallback. More info is here.

Beyond processor strategy, we should look at deployment & tuning and caching.

Acceptance Criteria

kaladay commented 3 days ago

To answer the questions asked.

Are we using progressive JPGs? Yes, the current setting is:

# Progressive JPEGs are usually more compact.
processor.jpg.progressive = true

What is our JPG quality for derivatives? (I think it should be around 80/100) The current setting is:

# JPEG output quality (1-100).
processor.jpg.quality = 80

Are we using TurboJpegProcessor or something else for reading and writing JPEGs? (Edit) We may or may not be using TurboJpegProcessor due to the AutomaticSelectionStrategy. The manual selection is not being used.


processor.selection_strategy = AutomaticSelectionStrategy
...
processor.ManualSelectionStrategy.avi = FfmpegProcessor
processor.ManualSelectionStrategy.bmp =
processor.ManualSelectionStrategy.flv = FfmpegProcessor
processor.ManualSelectionStrategy.gif =
processor.ManualSelectionStrategy.jp2 = KakaduNativeProcessor
processor.ManualSelectionStrategy.jpg =
processor.ManualSelectionStrategy.mov = FfmpegProcessor
processor.ManualSelectionStrategy.mp4 = FfmpegProcessor
processor.ManualSelectionStrategy.mpg = FfmpegProcessor
processor.ManualSelectionStrategy.pdf = PdfBoxProcessor
processor.ManualSelectionStrategy.png =
processor.ManualSelectionStrategy.tif =
processor.ManualSelectionStrategy.webm = FfmpegProcessor
processor.ManualSelectionStrategy.xpm =

Fall back to this processor for any formats not assigned above.

processor.ManualSelectionStrategy.fallback = Java2dProcessor


~The `jp2`, I believe is JPEG2000,  appears to use `KakaduNativeProcessor`.~
~The `jpg` is empty, which suggests that it falls back to `Java2dProcessor`.~

The documentation shows that there are several compile time or system design time setups that are necessary to actually use things like JPEG2000.
The JPEG2000 is also being recommended against by the documentation and is only recommended as a last resort.

The OpenJPEG is considered a better alternative.

I am seeing the following in the logs:

415 Unsupported Media Type Unsupported output format: JPEG2000

edu.illinois.library.cantaloupe.processor.OutputFormatException: Unsupported output format: JPEG2000 at edu.illinois.library.cantaloupe.processor.Processor.validate(Processor.java:204) at edu.illinois.library.cantaloupe.resource.ImageRequestHandler.handle(ImageRequestHandler.java:399) at edu.illinois.library.cantaloupe.resource.iiif.v2.ImageResource.doGET(ImageResource.java:128) at edu.illinois.library.cantaloupe.resource.HandlerServlet.handle(HandlerServlet.java:97) at edu.illinois.library.cantaloupe.resource.HandlerServlet.doGet(HandlerServlet.java:35) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.Server.handle(Server.java:516) at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383) at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:556) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905) at java.base/java.lang.Thread.run(Thread.java:840)


> Specifically, are we using LZW on TIFFs or no compression at all?
Yes, we are using:

TIFF output compression type. Available values are Deflate, JPEG,

LZW, and RLE. Leave blank for no compression.

processor.tif.compression = LZW


> ...but I think more modern strategies are StreamStrategy with CacheStrategy as a fallback.
We appear to be doing the following on PROD: