cantaloupe-project / cantaloupe

High-performance dynamic image server in Java
https://cantaloupe-project.github.io/
Other
264 stars 104 forks source link

Delegate Exception: receiver is null #402

Closed kaij closed 3 years ago

kaij commented 3 years ago

We have Cantaloupe in place with a high-traffic website. The server has 32GB of RAM (24GB assigned to Docker) and 8 CPUs. The delegate script is used for authorization with application backends as well as XMP embedding. We're using a Cantaloupe 5.0 Snapshot version because of the XMP processing in the delegate script.

Things generally work fine, but after some days, we get the exception below. It happens on some request, but not repeatable. But once it starts appearing, more and more tile requests are served with HTTP 500.

image

My impression is that it starts appearing when the heap size has once reached the limit. My first ideas would be to limit memory to 512mb and then load-test the server and see whether it's possible to force-trigger the exception.

Do you have any suggestions for debugging or solving this exception? I'm happy to hear...

adolski commented 3 years ago

Hi @kaij,

It looks like the JSR-223 system is not able to instantiate a delegate object after some time/conditions. I don't know why that would be--maybe some kind of bug in JRuby. If it's memory-related then it should be throwing an exception, not returning null.

I'm willing to write some code that retries the instantiation a few times, if you would be willing to test it. Basically I would just add a null check inside a loop here:

https://github.com/cantaloupe-project/cantaloupe/blob/develop/src/main/java/edu/illinois/library/cantaloupe/delegate/JRubyDelegateProxy.java#L93-L103

I plan to update JRuby before a 5.0 release, but no guarantee that it will be a fix.

Also, the latest 5.0 snapshot supports a Java delegate, which cuts JRuby out of the loop if you are willing to rewrite all your delegate script code in Java.

kaij commented 3 years ago

Hi @adolski,

thanks a lot for your quick response. I'd be happy to test your code on the staging & production system. It might take some time until I can give you a first feedback as memory needs to build up until the errors can be seen.

Updating JRuby might also be worth testing, but I think this should be tested in more detail on our side first before deploying on production.

If that doesn't help, rewriting our delegate code in Java might be worth looking at. But I fear this could lead to configuration issues (different configuration on different systems will need different builds of Cantaloupe?). I guess it would be possible to add custom configuration properties and use them in the delegate. I will look at the code and think more about it in any case.

Thanks for your help! Best regards, Kai Edit: ah, so this way it could be encapsulated https://github.com/cantaloupe-project/sample-java-delegate

adolski commented 3 years ago

Okay, I will try to get the retrying-the-instantiation in place soon.

One more question: do you see any errors in your log from right before this error? It would look something like

00:00:00.000 [thread] ERROR e.i.l.c.s.JRubyDelegateProxy - [message & stack trace]

Specifically I'm looking for something from here: https://github.com/cantaloupe-project/cantaloupe/blob/3724cd228cf937250f167f20014b9eef77fd8fe1/src/main/java/edu/illinois/library/cantaloupe/delegate/JRubyDelegateProxy.java#L99

kaij commented 3 years ago

I can see some errors, but they seem to come from an error in the delegate script. But I don't think they are directly related. Anyway, I need to go ahead and fix these. I don't really know where to start - maybe I can match the original request in the traefik service.

infra-iiif-cantaloupe | 14:22:14.568 [qtp1962826816-37814] ERROR e.i.l.c.d.JRubyDelegateProxy - java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 10
infra-iiif-cantaloupe | javax.script.ScriptException: java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 10
infra-iiif-cantaloupe |     at org.jruby.embed.jsr223.JRubyEngine.wrapException(JRubyEngine.java:104)
infra-iiif-cantaloupe |     at org.jruby.embed.jsr223.JRubyEngine.eval(JRubyEngine.java:93)
infra-iiif-cantaloupe |     at org.jruby.embed.jsr223.JRubyEngine.eval(JRubyEngine.java:142)
infra-iiif-cantaloupe |     at edu.illinois.library.cantaloupe.delegate.JRubyDelegateProxy.instantiateDelegate(JRubyDelegateProxy.java:96)
infra-iiif-cantaloupe |     at edu.illinois.library.cantaloupe.delegate.JRubyDelegateProxy.<init>(JRubyDelegateProxy.java:90)
infra-iiif-cantaloupe |     at edu.illinois.library.cantaloupe.delegate.DelegateProxyService.newDelegateProxy(DelegateProxyService.java:175)
infra-iiif-cantaloupe |     at edu.illinois.library.cantaloupe.resource.AbstractResource.getDelegateProxy(AbstractResource.java:430)
infra-iiif-cantaloupe |     at edu.illinois.library.cantaloupe.resource.iiif.v2.ImageResource.doGET(ImageResource.java:112)
infra-iiif-cantaloupe | 10:07:08.082 [qtp283318938-56] ERROR e.i.l.c.d.DelegateProxyService - newDelegateProxy(): java.util.ConcurrentModificationException
infra-iiif-cantaloupe | javax.script.ScriptException: java.util.ConcurrentModificationException
infra-iiif-cantaloupe |     at org.jruby.embed.jsr223.JRubyEngine.wrapException(JRubyEngine.java:104)
infra-iiif-cantaloupe |     at org.jruby.embed.jsr223.JRubyEngine.invokeMethod(JRubyEngine.java:221)
infra-iiif-cantaloupe |     at edu.illinois.library.cantaloupe.delegate.JRubyDelegateProxy.invoke(JRubyDelegateProxy.java:340)
infra-iiif-cantaloupe |     at edu.illinois.library.cantaloupe.delegate.JRubyDelegateProxy.setRequestContext(JRubyDelegateProxy.java:119)
infra-iiif-cantaloupe |     at edu.illinois.library.cantaloupe.delegate.DelegateProxyService.newDelegateProxy(DelegateProxyService.java:176)
infra-iiif-cantaloupe |     at edu.illinois.library.cantaloupe.resource.AbstractResource.getDelegateProxy(AbstractResource.java:430)
infra-iiif-cantaloupe |     at edu.illinois.library.cantaloupe.resource.iiif.v2.ImageResource.doGET(ImageResource.java:112)
adolski commented 3 years ago

Hmmm, I'm not sure what those are. But the instantiation-retrying is on develop now.

kaij commented 3 years ago

Thanks, I have it up and running. One exception I can see is below. I tried both using docker images for GraalVM and OpenJDK, but the startup warning stays the same.

Furthermore, I successed in tracking down the exceptions above to specific requests. However, the errors are not reproducible, so somehow they might be related. The only strings I have with length=10 in the delegate script that might trigger this is using context['identifier'] - but this should be always set in the pre_authorize function(?)...

infra-iiif-cantaloupe | WARNING: An illegal reflective access operation has occurred
infra-iiif-cantaloupe | WARNING: Illegal reflective access by com.headius.backport9.modules.Modules (file:/home/cantaloupe/cantaloupe-develop/target/cantaloupe-5.0-SNAPSHOT.jar) to method sun.nio.ch.NativeThread.signal(long)
infra-iiif-cantaloupe | WARNING: Please consider reporting this to the maintainers of com.headius.backport9.modules.Modules
infra-iiif-cantaloupe | WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
infra-iiif-cantaloupe | WARNING: All illegal access operations will be denied in a future release
iiif-cantaloupe | 06:59:10.013 [qtp1962826816-19] ERROR e.i.l.c.d.DelegateProxyService - newDelegateProxy(): java.util.ConcurrentModificationException
iiif-cantaloupe | javax.script.ScriptException: java.util.ConcurrentModificationException
iiif-cantaloupe |   at org.jruby.embed.jsr223.JRubyEngine.wrapException(JRubyEngine.java:104)
iiif-cantaloupe |   at org.jruby.embed.jsr223.JRubyEngine.invokeMethod(JRubyEngine.java:221)
iiif-cantaloupe |   at edu.illinois.library.cantaloupe.delegate.JRubyDelegateProxy.invoke(JRubyDelegateProxy.java:362)
iiif-cantaloupe |   at edu.illinois.library.cantaloupe.delegate.JRubyDelegateProxy.setRequestContext(JRubyDelegateProxy.java:141)
iiif-cantaloupe |   at edu.illinois.library.cantaloupe.delegate.DelegateProxyService.newDelegateProxy(DelegateProxyService.java:176)
iiif-cantaloupe |   at edu.illinois.library.cantaloupe.resource.AbstractResource.getDelegateProxy(AbstractResource.java:430)
iiif-cantaloupe |   at edu.illinois.library.cantaloupe.resource.iiif.v2.ImageResource.doGET(ImageResource.java:112)
iiif-cantaloupe |   at edu.illinois.library.cantaloupe.resource.HandlerServlet.handle(HandlerServlet.java:97)
iiif-cantaloupe |   at edu.illinois.library.cantaloupe.resource.HandlerServlet.doGet(HandlerServlet.java:35)
iiif-cantaloupe |   at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
iiif-cantaloupe |   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
iiif-cantaloupe |   at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:760)
iiif-cantaloupe |   at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:547)
iiif-cantaloupe |   at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
iiif-cantaloupe |   at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)
iiif-cantaloupe |   at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
iiif-cantaloupe |   at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
iiif-cantaloupe |   at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
iiif-cantaloupe |   at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)
iiif-cantaloupe |   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
iiif-cantaloupe |   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
iiif-cantaloupe |   at org.eclipse.jetty.server.Server.handle(Server.java:500)
iiif-cantaloupe |   at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
iiif-cantaloupe |   at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
iiif-cantaloupe |   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
iiif-cantaloupe |   at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270)
iiif-cantaloupe |   at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
iiif-cantaloupe |   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
iiif-cantaloupe |   at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
iiif-cantaloupe |   at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
iiif-cantaloupe |   at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
iiif-cantaloupe |   at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
iiif-cantaloupe |   at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
iiif-cantaloupe |   at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:388)
iiif-cantaloupe |   at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
iiif-cantaloupe |   at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
iiif-cantaloupe |   at java.base/java.lang.Thread.run(Thread.java:834)
iiif-cantaloupe | Caused by: java.util.ConcurrentModificationException: null
iiif-cantaloupe |   at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1493)
iiif-cantaloupe |   at java.base/java.util.HashMap$EntryIterator.next(HashMap.java:1526)
iiif-cantaloupe |   at java.base/java.util.HashMap$EntryIterator.next(HashMap.java:1524)
iiif-cantaloupe |   at org.jruby.embed.jsr223.Utils.preEval(Utils.java:106)
iiif-cantaloupe |   at org.jruby.embed.jsr223.JRubyEngine.invokeMethod(JRubyEngine.java:211)
iiif-cantaloupe |   ... 35 common frames omitted
kaij commented 3 years ago

Update: so far looks good, I have not seen the exception again. I still see the "ArrayIndexOutOfBoundsException" however, which seems to pop up suddenly and then disappears again. The requests are successful when replaying.

Update 2: was too fast, I can still see the java.util.ConcurrentModificationException on the test system (where less Java Heap is assigned)

adolski commented 3 years ago

Hi @kaij,

Good news about this working. I don't know about the other two exceptions, but they are probably internal to JRuby. it might be possible to work around them in a similar way, but I see in the change logs that they've fixed some concurrency issues in recent releases, so I've just pushed an update of JRuby to the latest release (9.2.13.0). If these errors persist, then I or someone should report them to the JRuby folks and/or I can try to work around them in a similar way as before.

kaij commented 3 years ago

Cool. I have this (updated JRuby version) now in-place on production system. Let's wait and see - I'm quite confident, the warnings on startup seem to have vanished too.

I also successfully implemented a nice java delegate replacement on our test system and we are ready to switch over to this on production in case any jruby problems should come up again.

Thanks again for your quick and efficient help! I'll update this ticket later on again with the status.

kaij commented 3 years ago

It's been running now smoothly - without any of the above errors. Still using the JRuby delegate in production. Looks like the measures have been very successful. 🥳

kaij commented 3 years ago

It seems I was just a little bit too early. The following exception (leading to HTTP 500) shows again now. It starts with the InvokeFailedException and then gives about 10 exceptions per second. It seems to process about 30 exceptions per second 2 hours later. I have the impression that they start appearing as soon as memory is on limit.

I will switch to the Java Delegate tomorrow. If it's really a bug in the JRuby integration, there should be no problems anymore.

infra-iiif-cantaloupe | 09:37:33.267 [qtp1130894323-61443] ERROR e.i.l.c.o.OperationList - org.jruby.embed.InvokeFailedException: java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 10
infra-iiif-cantaloupe | 09:37:33.269 [qtp1130894323-61443] ERROR e.i.l.c.o.OperationList - org.jruby.embed.InvokeFailedException: java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 10
infra-iiif-cantaloupe | 09:37:33.301 [qtp1130894323-61443] ERROR e.i.l.c.d.JRubyDelegateProxy - java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 10
infra-iiif-cantaloupe | 09:37:33.448 [qtp1130894323-61443] ERROR e.i.l.c.d.JRubyDelegateProxy - java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 10
infra-iiif-cantaloupe | 09:37:33.449 [qtp1130894323-61443] ERROR e.i.l.c.d.JRubyDelegateProxy - java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 10
infra-iiif-cantaloupe | 09:37:33.524 [qtp1130894323-61443] ERROR e.i.l.c.d.JRubyDelegateProxy - java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 10
infra-iiif-cantaloupe | 09:37:33.525 [qtp1130894323-61443] ERROR e.i.l.c.d.JRubyDelegateProxy - java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 10
infra-iiif-cantaloupe | 09:37:33.525 [qtp1130894323-61443] ERROR e.i.l.c.d.JRubyDelegateProxy - java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 10
infra-iiif-cantaloupe | 09:37:33.526 [qtp1130894323-61443] ERROR e.i.l.c.d.JRubyDelegateProxy - java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 10
kaij commented 3 years ago

The exception details might be interesting:


infra-iiif-cantaloupe | [387684.466s][warning][gc,alloc] qtp1130894323-61498: Retried waiting for GCLocker too often allocating 1518594 words
infra-iiif-cantaloupe | 09:37:33.267 [qtp1130894323-61443] ERROR e.i.l.c.o.OperationList - org.jruby.embed.InvokeFailedException: java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 10
infra-iiif-cantaloupe | javax.script.ScriptException: org.jruby.embed.InvokeFailedException: java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 10
infra-iiif-cantaloupe |     at org.jruby.embed.jsr223.JRubyEngine.wrapException(JRubyEngine.java:104)
infra-iiif-cantaloupe |     at org.jruby.embed.jsr223.JRubyEngine.invokeMethod(JRubyEngine.java:221)
infra-iiif-cantaloupe |     at edu.illinois.library.cantaloupe.delegate.JRubyDelegateProxy.invoke(JRubyDelegateProxy.java:362)
infra-iiif-cantaloupe |     at edu.illinois.library.cantaloupe.delegate.JRubyDelegateProxy.invoke(JRubyDelegateProxy.java:338)
infra-iiif-cantaloupe |     at edu.illinois.library.cantaloupe.delegate.JRubyDelegateProxy.getRedactions(JRubyDelegateProxy.java:288)
infra-iiif-cantaloupe |     at edu.illinois.library.cantaloupe.operation.redaction.RedactionService.redactionsFor(RedactionService.java:27)
infra-iiif-cantaloupe |     at edu.illinois.library.cantaloupe.operation.OperationList.applyNonEndpointMutations(OperationList.java:204)
infra-iiif-cantaloupe |     at edu.illinois.library.cantaloupe.resource.generic.ImageRequestHandler.handle(ImageRequestHandler.java:384)
infra-iiif-cantaloupe |     at edu.illinois.library.cantaloupe.resource.iiif.v2.ImageResource.doGET(ImageResource.java:116)```
adolski commented 3 years ago

I don't know enough about what is happening with JRuby or the JVM here, unfortunately. Perhaps a different garbage collector, or a newer JVM, would cause different results. But I assume that Java delegate will also be an effective workaround.