glencoesoftware / omero-ms-core

OMERO Vert.x microservice core
GNU General Public License v2.0
2 stars 6 forks source link

Unexpected opcode for serverId: LONG_BINGET #32

Open kkoz opened 1 month ago

kkoz commented 1 month ago

The following error was observed in a customer's logs:

2024-05-23 13:11:16,620 [lettuce-nioEventLoop-4-1] ERROR c.g.o.m.core.PickledSessionConnector - Exception while deserializing: server_id
2024-05-23 13:11:16,621 [lettuce-nioEventLoop-4-1] ERROR c.g.o.m.c.OmeroWebRedisSessionStore - Exception while unpickling connector
java.lang.IllegalArgumentException: Unexpected opcode for serverId: LONG_BINGET
    at com.glencoesoftware.omero.ms.core.PickledSessionConnector.deserializeServerId(PickledSessionConnector.java:179)
    at com.glencoesoftware.omero.ms.core.PickledSessionConnector.deserializeConnector(PickledSessionConnector.java:145)
    at com.glencoesoftware.omero.ms.core.PickledSessionConnector.init(PickledSessionConnector.java:94)
    at com.glencoesoftware.omero.ms.core.PickledSessionConnector.<init>(PickledSessionConnector.java:76)
    at com.glencoesoftware.omero.ms.core.OmeroWebRedisSessionStore.lambda$getConnector$0(OmeroWebRedisSessionStore.java:87)
    at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
    at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
    at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2079)
    at io.lettuce.core.protocol.AsyncCommand.completeResult(AsyncCommand.java:122)
    at io.lettuce.core.protocol.AsyncCommand.complete(AsyncCommand.java:111)
    at io.lettuce.core.protocol.CommandWrapper.complete(CommandWrapper.java:59)
    at io.lettuce.core.protocol.CommandHandler.complete(CommandHandler.java:652)
    at io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:612)
    at io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:563)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:635)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:552)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514)
    at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:829)

I have been unable to reproduce locally, but have examined the code and have a hypothesis about what might be going on.

The error is thrown from here: https://github.com/glencoesoftware/omero-ms-core/blob/master/src/main/java/com/glencoesoftware/omero/ms/core/PickledSessionConnector.java#L179

The fact that the code reached this line means that it failed to parse the serverId in handleStringValue or handleNumberValue. This is surprising, because handleStringValue is supposed to handle LONG_BINGET here: https://github.com/glencoesoftware/omero-ms-core/blob/master/src/main/java/com/glencoesoftware/omero/ms/core/PickledSessionConnector.java#L260

My hypothesis then, is that the next line: https://github.com/glencoesoftware/omero-ms-core/blob/master/src/main/java/com/glencoesoftware/omero/ms/core/PickledSessionConnector.java#L261

Is returning null. We would expect this get to always succeed, but if it doesn't there's no logging or error raised, the function simply returns null and then the calling code continues as if the This could be for a couple reasons. One is that there is some kind of mismatch in the key value (this seems unlikely to me). The other is that the key/value was not put into the memo map prior to this call to get. I think this is the issue. The only line where we put things in the map is here: https://github.com/glencoesoftware/omero-ms-core/blob/master/src/main/java/com/glencoesoftware/omero/ms/core/PickledSessionConnector.java#L103

So if the MEMOIZE opcode is encountered during the deserializeConnector loop, it will simply be skipped over. This is what I think is happening here, but I'd need to experiment a bit further to see if this is possible.

chris-allan commented 1 month ago

Thanks, @kkoz. My sense is that, as you've already surmised, #27 was just not quite thorough enough to handle all the conditions. Some thoughts:

  1. Retrieving a null from the memo map should be fatal and an exception thrown if that is the case.
  2. If parsing of the pickle bytecode fails, we should dump a base64 encoded version of the serialized bytes to the log.