Sometimes S3 Derivative cache ends with a corrupted image

DiegoPino commented 3 years ago

Hi,

This is Question:

With 4.1.7 recently we found a few times (not a norm) that some derivative caches being stored in S3 end being corrupted. Sometimes it is totally wrong, sometimes its partially (visually corrupted) tiles. Since the cache exists its served consistently wrong until it expires or we clear it manually.

This is an example https://dome.archipelago.nyc/cantaloupe/iiif/2/0a9%2Fimage-4e45e5d2b7ffec65a901778abdb124bb-kl-000094-165d1f00-7d78-4469-af24-b21f016d613f.tif/full/240,/0/default.jpg

Same one delivers well by adding a ?nocache or any argument really.

https://dome.archipelago.nyc/cantaloupe/iiif/2/0a9%2Fimage-4e45e5d2b7ffec65a901778abdb124bb-kl-000094-165d1f00-7d78-4469-af24-b21f016d613f.tif/full/240,/0/default.jpg?nocache

I noticed that the main difference is that the later does not include in the headers a Content-Length: 24576

This is mediated (proxied) by NGINX.

The question is about how S3 upload is being validated (if at all) and also asking for some advice/hearing experience dealing with S3 as derivative cache storage.

Our current plan is to read code to understand this better but also allow our users (admins) to have a "Button" at each Object to call the Cantaloupe API and call the Item Cache clear endpoint for those items but wondering if anyone has any suggestions on how to make S3 Caching safer.

Thanks a lot!

DiegoPino commented 3 years ago

I did some code reading to understand better what is happening. So here https://github.com/cantaloupe-project/cantaloupe/blob/792f17575b42f711eb99d99875066134fa2c406e/src/main/java/edu/illinois/library/cantaloupe/cache/S3Cache.java#L154

Its not clear to me if the issues I see are because the stream was already corrupted and this is totally not on the S3 side of things (cache did just what was expected) and simply updated what it got once the Content Length was available (I would assume that is the case since S3 would (should?) not accept/write bytes if the Content Length did not match with what was send) or the Stream (buffer) failed and closed before it finished (an overflow?) and was partially upload to S3. The later seems less possible but it could be also that the buffer was "compensated" with the missing bytes during the byte to byte operation and in that case the the length was "ok" but the data itself not a valid image? Given https://github.com/cantaloupe-project/cantaloupe/blob/792f17575b42f711eb99d99875066134fa2c406e/src/main/java/edu/illinois/library/cantaloupe/cache/S3Cache.java#L186

and that closing then Stream is when the actual Upload happens here

https://github.com/cantaloupe-project/cantaloupe/blob/792f17575b42f711eb99d99875066134fa2c406e/src/main/java/edu/illinois/library/cantaloupe/cache/S3Cache.java#L123

If that would be the case (just a single reading of the class and some https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/PutObjectRequest.html) I wonder if a safety measure would be to also calculate the md5 checksum of the stream (buffer) and pass that along part of the metadata option? Sorry, I may not making any sense.

I see the possible overhead in terms of performance. Maybe the better option would be to abort the Upload at all in case of an exception/mismatch of data, should I suspect in case of failure S3 may still accept the data and write it?

I will do more testing but since this is not consistent behavior it will take me some time to fetch that edge case where the derivatives fail. Thanks

adolski commented 3 years ago

Hi @DiegoPino , sorry for the wait on this. I see a problem with the way failed cache writes are handled here:

https://github.com/cantaloupe-project/cantaloupe/blob/16e90feef6a848bbdbf9238f38fa8f28c4a4e532/src/main/java/edu/illinois/library/cantaloupe/resource/ImageRepresentation.java#L134

That logic assumes that the writing is synchronous so that, by the time that statement is called, a corrupt file actually exists in the cache. But this would rarely be the case for S3Cache, because the upload to S3 wouldn't have completed yet

So this is going to need some revision...

DiegoPino commented 3 years ago

@adolski no worries. You do already so much for all of us. Thanks for writing back. I understand the problem and may need to read more but without prior deep knowledge, maybe cacheFacade.purge(opList) could be a delayed action (too) using something similar to the API Item cache clear? A task that is enqueued for later processing? Means it could do what it does now (no change there, if there is any cache, most likely on S3 and other async uploads won't as you state correctly) but also give it a later time chance of doing so also async?

Worst case that Task will also fail.

Of course it is not exact science but probabilistically FutureTask could deal with non existing S3 upload yet? Thinking about less refactoring because if not you will have to actually have a "was persisted" type of event to decide on purging based on previous knowledge (gosh async) that cache actually failed.

Let me know if there is anything I can do to test/help or even discuss (if that helps)

Thanks again for everything

adolski commented 3 years ago

Thanks @DiegoPino. I think a better approach, instead of trying to clean up these corrupt files, is to try to keep them from getting into the cache.

This is really a design flaw (that has been present forever 🤦) that results from the fact that there is no way to tell from within OutputStream.close() whether it has been written completely or not--all of the cache implementations just assume that it has. The solution I'm thinking of involves changing DerivativeCache.newDerivativeImageOutputStream() to return a custom OutputStream with a completion flag. All of the implementations' OutputStreams returned from that method initially write to some intermediary place:

FilesystemCache writes to a temp file
JdbcCache writes inside a transaction
All the rest write into a memory buffer

and then commit the data somehow or other upon closure--move the file, commit the transaction, or upload the data. When ImageRepresentation.write() succeeds, it can set the flag to true, and then all of the OutputStream.close() implementations can check it and either commit the data if it returns true, or discard it otherwise. (Currently they are just committing everything.)

adolski commented 3 years ago

@DiegoPino I've pushed a commit that should fix this to the release/4.1 branch. Feel free to try it out and let me know if this is still happening.

I haven't merged into 5.0 yet, but will do in the near future.

DiegoPino commented 3 years ago

Wow. This is awesome. Had the chance to read your code before my home internet went down (NYC gosh, on the phone now) and all makes sense. Will build and test in production tomorrow morning and will force some failures (i learned how!) and report back. Thanks again for the fast and caring fix. Very grateful

adolski commented 3 years ago

Hi @DiegoPino, any luck with this fix?

DiegoPino commented 3 years ago

@adolski sorry for the silence. I have been testing it in production for a week already with very limited Memory to force it to fail. Give me 2 days more please (preparing some workshops and a Release candidate so bit slower) to hit it hard via Apache Benchmark but so far good! Will report back once I finish. Sorry again

adolski commented 3 years ago

No problem @DiegoPino, take your time.

DiegoPino commented 3 years ago

@adolski reporting back. So far good. I set JP2000 processing to GraphicsMagic because I noticed a pattern of broken pipes when requesting many images (e.g thumbnails) and then breaking the request and moving to another page. I do know GraphicsMagic is going away on 5.x but it is a good reference since it can stream but also uses FileSystem which can trigger broken images issues more easily. (Concurrency may be the key here)

17:37:56.610 [qtp873415566-622] WARN  c.a.s.s.i.S3AbortableInputStream - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.

Followed this here. I saw a few c.a.s.s.i.S3AbortableInputStream also happen without GM trigger any errors but those where because of Closed Client connections (that should be expected right?)

Caused by: edu.illinois.library.cantaloupe.processor.ProcessorException: org.eclipse.jetty.io.EofException
    at edu.illinois.library.cantaloupe.processor.GraphicsMagickProcessor.process(GraphicsMagickProcessor.java:462)
    at edu.illinois.library.cantaloupe.resource.ImageRepresentation.process(ImageRepresentation.java:190)
    at edu.illinois.library.cantaloupe.resource.ImageRepresentation.copyOrProcess(ImageRepresentation.java:157)
    ... 39 common frames omitted
  Caused by: org.eclipse.jetty.io.EofException: null
    at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:283)
    at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422)
    at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:277)
    at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
    at org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:804)
    at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
    at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223)
    at org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:541)
    at org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:824)
    at org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:901)
    at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:259)
    at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:235)
    at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:608)
    at org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:90)
    at org.apache.commons.io.output.TeeOutputStream.write(TeeOutputStream.java:64)
    at edu.illinois.library.cantaloupe.process.Pipe.copyBytes(Pipe.java:80)
    at edu.illinois.library.cantaloupe.process.Pipe.consumeOutput(Pipe.java:54)
    at edu.illinois.library.cantaloupe.process.ProcessStarter.processOutput(ProcessStarter.java:52)
    at edu.illinois.library.cantaloupe.process.ProcessStarter.lambda$waitForProcess$0(ProcessStarter.java:129)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    ... 1 common frames omitted
  Caused by: java.io.IOException: Broken pipe
    at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
    at java.base/sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:62)
    at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
    at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:58)
    at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:50)
    at java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:484)
    at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:261)
    ... 24 common frames omitted

The behavior so far has been to not write the in permanent Cache into S3 (need to do more debugging to be sure) in case of failure, but, what may be inconclusive odd behavior is that the response still gets back to the browser with cache-able HTTP tags. Not odd that the broken image goes to the browser (I get that, that is Ok) but odd in terms of expectations that the broken image get's cache tags, and browser will of course keep using the local caches if the config is enabled. I can't confirm this yet and may be older caches being stuck since the response in this cases would be a 500 right?

I can be looking at the wrong data here and maybe its not the case (hard to trigger a header inspection exactly when things break but I'm getting better at breaking things) but of course that all gets solved if I disable at all client side caching in the Cantaloupe properties.

Does this make sense? I will keep the 4.1.9-Snapshot running a little longer and may have more to share this week. For some reason it is easier to trigger errors when using Mirador 3 as Viewer, it may be concurrency of the tiled requests calls or the tiling algorithm, and all is even easier to trigger when using JP2s and Graphicsmagic (for now on purpose while testing).

Anything you feel I should be looking at? Thanks again. This is really good work and has helped us a lot. S3 incomplete buffers being written was driving us crazy!

DiegoPino commented 3 years ago

@adolski Ok. All good. Marking this as Fixed! Finished testing and your fix works perfectly. No more caching when buffer is incomplete. I will close this issue and use this chance to thank you again for all the great work you do here. Really amazing and humbling work/code/and community response. Thanks a lot

adolski commented 3 years ago

Great news @DiegoPino, thanks for reporting and testing this. I just released it as part of 4.1.9. Also, I was looking through your previous post, but then I saw that you closed the issue, so let me know if there is any other follow-up that is required related to errors etc.

DiegoPino commented 3 years ago

@adolski thanks to you. All seems to be working fine (3 weeks or so already, 3 servers) and even if derivatives do fail from time to time nothing gets stuck. We are so happy! No follow ups right now but if something comes up I will open a different issue. Thanks so much again.

Hugs

cantaloupe-project / cantaloupe

Sometimes S3 Derivative cache ends with a corrupted image #459