cantaloupe-project / cantaloupe

High-performance dynamic image server in Java
https://cantaloupe-project.github.io/
Other
268 stars 108 forks source link

Incorrect upscaling for some files with size=max #680

Open mmatela opened 2 months ago

mmatela commented 2 months ago

Using cantaloupe 5.0.6, default settings except for FilesystemSource.BasicLookupStrategy.path_prefix. Java 21.0.1 on Windows, but experienced the same on other versions of Java on Linux.

Attaching sample file 2.jpg 2

When trying to get a fragment of size 100x100 with this url: http://localhost:8182/iiif/3/2.jpg/100,100,100,100/max/0/default.jpg i get the fragment upscaled to size 2604x2604 instead.

It works correctly when using size pct:100 instead of max (should there ever be a difference for these two?): http://localhost:8182/iiif/3/2.jpg/100,100,100,100/pct:100/0/default.jpg It works correctly with image api v2: http://localhost:8182/iiif/2/2.jpg/100,100,100,100/full/0/default.jpg Somehow it accepts size=max for v2, and also returns upscaled to 2604x2604: http://localhost:8182/iiif/2/2.jpg/100,100,100,100/max/0/default.jpg

I tried some other files in both jpg and other formats and for most of them it correctly returns 100x100, so it must be something inside the content file, but I'm not aware of any properties that should lead to this result.

DiegoPino commented 2 months ago

Hi @mmatela. Could you share the info.json output for that image? If this was a general issue for all the images I would say there is a problem/order of operation with the Specs implementation for /max but since you reported it happens just in some images it might be something specific maybe to A) wrongly processed by cantaloupe resolution (your image has 400dpi) ? wrongly parsed Exif? or even stale caches?. Still the specs are clear. Region THEN Size THEN Rotation THEN Quality THEN Format For the stale caches part (this is just an idea, not stating really a solution but worth trying) you can call the tile URL with

Overriding Cache Behavior
The behavior of the [caches](https://cantaloupe-project.github.io/manual/5.0/caching.html) can be overridden on a per-request basis by supplying a cache URL query argument with one of the following values:

nocache or false
Bypasses the source, derivative, and info caches, and also omits any Cache-Control response header that may be configured.
recache
Bypasses the derivative and info caches for reads but not writes.

I will check the code that sets up the operations and see what I can find out

mmatela commented 2 months ago

Thanks for looking into this @DiegoPino

Attaching info from http://localhost:8182/iiif/3/2.jpg/info.json : info.json

IrfanView shows the file indeed has resolution of 400x400 dpi.

I think we can rule out cache, this is the first time I've started Cantaloupe on this machine and this 2.jpg file has never changed.

DiegoPino commented 2 months ago

hi @mmatela. I tested locally on my self built Cantaloupe (Docker) and I can't reproduce it. I tested java2D, turbojpeg and Jai processors. Both version 2 and 3 Calling (in my case) https://localhost:8183/iiif/3/d66%2Fimage-354200823-eca73bfc-7194-4a50-9830-beee020d9e1a-ebce0b81-b3ab-4200-b124-0a95d5202959.jpg/100,100,100,100/max/0/default.jpg

Always generated

image

Not sure if the act of uploading the image to GitHub/me downloading it might have re-encoded the JPEG?

I used this docker container: esmero/cantaloupe-s3:6.0.1-multiarch. It is a year old based on development, so probably closer to v5.0.5 with JDK 15

If you could give 5.0.5 a try while I figure out what has changed/is different?

I can't try a vanilla 5.0.6 today but will put in my todo list.

mmatela commented 2 months ago

I see the same problematic behavior with 5.0.4 and 5.0.5. With the file downloaded from github through this link: https://github.com/user-attachments/assets/eca73bfc-7194-4a50-9830-beee020d9e1a (redirected the browser to amazonaws, then I pressed ctrl+s)

DiegoPino commented 2 months ago

Wonder if java 21 has something to do with this. Do you see any errors/warnings in your cantaloupe log?

mmatela commented 2 months ago

I first noticed this problem on a Linux machine with Java 11.

console output during request handling (changed the file name to make sure it's loaded from scratch):

e.i.l.c.r.i.v.ImageResource - Handling GET /iiif/3/354200823-eca73bfc-7194-4a50-9830-beee020.jpg/100,100,100,100/max/0/default.jpg
e.i.l.c.r.i.v.ImageResource - Request headers: Cookie: [...................]
e.i.l.c.i.MetaIdentifier - [Raw path component: 354200823-eca73bfc-7194-4a50-9830-beee020.jpg] -> [decoded: 354200823-eca73bfc-7194-4a50-9830-beee020.jpg] -> [slashes substituted: 354200823-eca73bfc-7194-4a50-9830-beee020.jpg]
e.i.l.c.i.Identifier - [Raw path component: 354200823-eca73bfc-7194-4a50-9830-beee020.jpg] -> [decoded: 354200823-eca73bfc-7194-4a50-9830-beee020.jpg] -> [slashes substituted: 354200823-eca73bfc-7194-4a50-9830-beee020.jpg]
e.i.l.c.c.FilesystemCache - getSourceImageFile(): \var\cache\cantaloupe\source\d5\f7\39\d5f73936f23408fbea6c2d37603e5305
e.i.l.c.s.FilesystemSource - Resolved 354200823-eca73bfc-7194-4a50-9830-beee020.jpg to T:\354200823-eca73bfc-7194-4a50-9830-beee020.jpg
e.i.l.c.p.ProcessorFactory - Failed to initialize TurboJpegProcessor (error: Expecting an absolute path of the library: /opt/libjpeg-turbo/lib/libturbojpeg.so)
e.i.l.c.p.ProcessorFactory - Java2dProcessor selected for format JPEG (AutomaticSelectionStrategy offered TurboJpegProcessor, Java2dProcessor)
e.i.l.c.p.ProcessorConnector - File -> FileProcessor connection between FilesystemSource and Java2dProcessor
e.i.l.c.p.c.j.JPEGImageReader - Using com.sun.imageio.plugins.jpeg.JPEGImageReader
e.i.l.c.c.InfoService - readInfo(): read 354200823-eca73bfc-7194-4a50-9830-beee020.jpg from Java2dProcessor in 12 msec
e.i.l.c.c.InfoService - putInObjectCache(): adding info: 354200823-eca73bfc-7194-4a50-9830-beee020.jpg (new size: 3)
e.i.l.c.c.InfoCache - putInObjectCache(): adding info: 354200823-eca73bfc-7194-4a50-9830-beee020.jpg (new size: 3)
e.i.l.c.r.i.v.ImageResource - Base URI assembled from X-Forwarded headers: http://localhost:8182
e.i.l.c.r.ImageRepresentation - Derivative cache not available; writing directly to the response
e.i.l.c.p.c.j.JPEGImageWriter - ImageIO plugin preferences: com.sun.imageio.plugins.jpeg.JPEGImageWriter
e.i.l.c.p.c.j.JPEGImageWriter - Using com.sun.imageio.plugins.jpeg.JPEGImageWriter
e.i.l.c.p.c.j.JPEGImageReader - Acquiring region 100,100/100x100 from 2781x4100 image
e.i.l.c.p.c.j.JPEGImageWriter - Quality: 80; progressive: true
e.i.l.c.r.ImageRepresentation - Java2dProcessor processed in 242 msec: 354200823-eca73bfc-7194-4a50-9830-beee020.jpg_1:1_cropbypixels:100,100,100,100_scalebypixels:!2604,3839,bicubic_encode:jpg_JPEG_80_interlace_#FFFFFF_8_e95198c48292fa4c8fc0617e850c1f6d
e.i.l.c.r.HandlerServlet - Responded to GET /iiif/3/354200823-eca73bfc-7194-4a50-9830-beee020.jpg/100,100,100,100/max/0/default.jpg with HTTP 200 in 261 msec

There's the suspicious number 2604 in the next-to-last line!

DiegoPino commented 2 months ago

Yes, that is strange. According to the code, max will call scaleByPercent see https://github.com/cantaloupe-project/cantaloupe/blob/60c3e2bae4fa3458e1df29113172319fcd6102a3/src/main/java/edu/illinois/library/cantaloupe/resource/iiif/v2/Size.java#L178

Could you please try enabling your derivative cache? and also force Java2D (manual source selection) to be JPEG processor? I can't find so far in the code a reason why _scalebypixels:!2604,3839 is called there. Only way that could be is if the cropbypixels failed and the resulting value ends being the original one ....

Need to check more

DiegoPino commented 2 months ago

@mmatela any updates on this? Checking so we keep debugging this or mark as resolved. Thanks

mmatela commented 2 months ago

@DiegoPino Oh, I thought you'd be able to run version 5 locally and reproduce the problem, should be more efficient than having this back and forth. Like you said:

I can't try a vanilla 5.0.6 today but will put in my todo list.

DiegoPino commented 2 months ago

@mmatela I have not been successful replicating the issue, maybe because but I don't have a Windows/Java 21 combo to test? Maybe you could share with me your cantaloupe properties file? (config). If you do so I can at least have the same configs as you and try. I can't also find any reason in the code why max would trigger this. I prefer to test this actually (which I have now here) with a develop build. Thanks

mmatela commented 2 months ago

@mmatela I have not been successful replicating the issue, maybe because but I don't have a Windows/Java 21 combo to test?

Like I said, I've seen the same behavior on Linux with Java 11 and Windows with Java 21, so it doesn't seem to be OS/Java dependent. So far you've reported trying a docker with version 6.0.1. Did you try to simply download the latest release and run it with java -Dcantaloupe.config=cantaloupe.properties -Xmx2g -jar cantaloupe-5.0.6.jar?

Maybe you could share with me your cantaloupe properties file? (config). If you do so I can at least have the same configs as you and try.

I just took the sample properties file from the release package and only changed FilesystemSource.BasicLookupStrategy.path_prefix, like I said in the first post.

I can't also find any reason in the code why max would trigger this. I prefer to test this actually (which I have now here) with a develop build. Thanks

OK, but I don't see how trying to run it on my side with different configs will help with that.

DiegoPino commented 2 months ago

@mmatela the issue is we can not fix backwards an existing release. 5.x is done. We can only fix our ongoing/live code base (6.x from the develop branch) and only if the error can be reproduced and we know where/why it is happening. I will give reproducing it another try just for clarity on 5.0.6 but if I can't, then I must assume there is something in your environment/or my own that is different. The fact that this is dependent on a specific file and not a general issue makes all even more complex.

DiegoPino commented 2 months ago

@mmatela update: spent the last 3 hours debugging this. @glenrobson (ping just so you know this is happening)

I ran a vanilla 5.0.6 JAR file on JAVA 22 with your file changing just the filesystem path and I can replicate the issue. Then transformed your file into 72dpi, TIFF, forced the processor to manual to use JAVA2D and the same. Still scaling by pixel when it should not, and according the code, should be scaling by Percent.

Then retested on a fully functional docker container (will all the dependencies using a fresh develop build) and no issues at all. Returns and processes correctly.

I have so far really no idea, I compared the source classes for size and there are no changes between releases.

The code

https://github.com/cantaloupe-project/cantaloupe/blob/60c3e2bae4fa3458e1df29113172319fcd6102a3/src/main/java/edu/illinois/library/cantaloupe/resource/iiif/v3/Size.java#L199-L236

Should return scale by percent because (maxScale being the max_scale property == 1.0) and DELTA == 0.000001

So other maxScale in not being processed or there is a floating point comparison issue? Clueless.

                 return new ScaleByPercent(isUpscalingAllowed() ? maxScale : 1); 
             } else { 
      ... fail and upscale by pixel...

What I will do (tomorrow, 3 hours is my limit) is to do the same test with a develop jar using filesystem and no changes to properties files. This is not a processor issue, it is an argument parsing/constant assignment/comparison issue and might? *have no proof why, but might be even an issue with the actual built. Then if all works I will assume that JAR has issues, if not and develop on such a simplistic setup fails I will add debug statements all over the place to see where parsing MAX, setting max pixels and deciding that scale by percent is no good is happening.

DiegoPino commented 2 months ago

@mmatela @glenrobson ok, I got this. This is actually a bug (or an interpretation/math issue). And it still exists in develop. The reason I could not replicate is because none of my cantaloupes (so many) have the default configurations.

Let me explain what I found out:

@mmatela can you please delete your derivative caches if you enabled them , clear your browser cache and modify max_pixels to that other number I shared? And see if 100x100 delivers correctly? Thanks

Debugging where the error(s) is/are and making a pull will take me some time. Some help might be great!

DiegoPino commented 2 months ago

Further debugging. Seems that the issue is happening because of: (for my own reference)

https://github.com/cantaloupe-project/cantaloupe/blob/773ddeeb0fbb11abf899b8779d105fa3c508e9cb/src/main/java/edu/illinois/library/cantaloupe/resource/iiif/IIIFResource.java#L32-L56

DiegoPino commented 2 months ago

And might be also the culprit for https://github.com/cantaloupe-project/cantaloupe/issues/635

DiegoPino commented 2 months ago

So. I know where this is happening (woho!). There is a chain of calls that happen just after the operation list is built

https://github.com/cantaloupe-project/cantaloupe/blob/e025db6482754dd543b09db2a1f5afde75d9a19f/src/main/java/edu/illinois/library/cantaloupe/resource/ImageRequestHandler.java#L395 and invoking then

https://github.com/cantaloupe-project/cantaloupe/blob/e025db6482754dd543b09db2a1f5afde75d9a19f/src/main/java/edu/illinois/library/cantaloupe/resource/iiif/v2/ImageResource.java#L96-L105

But the size passed there is unaware of cropping (info.size is the full size of the source). So max is interpreted as a request being done against the full image and any already chained size operation is removed and a new one is appended breaking the contract.

There are two options here:

@jcoyne @ksclarke @glenrobson am I understanding the IIIF Image API specs correctly? max, when in the presence of any cropping (that is what I read so asking again) will deliver the max size allowed for the cropped region, based on the server settings? Right? wrong?

glenrobson commented 2 months ago

Congrats @DiegoPino on finding the issue!

max, when in the presence of any cropping (that is what I read so asking again) will deliver the max size allowed for the cropped region, based on the server settings? Right? wrong?

I think you are correct. max should deliver the full sized image of the crop unless it is large than the server configuered max size (either maxWidth, maxHeight or maxArea). See https://iiif.io/api/image/3.0/#42-size:

"max : The extracted region is returned at the maximum size available, but will not be upscaled. The resulting image will have the pixel dimensions of the extracted region, unless it is constrained to a smaller size by maxWidth, maxHeight, or maxArea as defined in the Technical Properties section."