LF-Decentralized-Trust-labs / paladin

Programmable privacy for EVM
https://lf-decentralized-trust-labs.github.io/paladin
Apache License 2.0
15 stars 6 forks source link

BufferOverflowException on plugin interface #433

Open hosie opened 1 week ago

hosie commented 1 week ago

Occasionally see transactions getting stuck waiting for endorsement and when I look in the logs, I see stack traces such as....

[2024-11-13T22:18:40.411Z]  INFO 05bd77b9-ae58-4415-8d38-37fceb332f88: ==> *prototk.DomainMessage_InitContract pid=1 plugin=DOMAIN[pente/16d4203e-9735-4617-8415-634ba4a3b8f8]
[2024-11-13T22:18:40.412Z] DEBUG 05bd77b9-ae58-4415-8d38-37fceb332f88: ==> {"header":{"pluginId":"16d4203e-9735-4617-8415-634ba4a3b8f8","messageId":"05bd77b9-ae58-4415-8d38-37fceb332f88","messageType":"REQUEST_TO_PLUGIN"},"initContract":{"contractAddress":"0x2488c5bdd4fceebd4755e3543f7613df26ffd284","contractConfig":"AAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAoAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIc2hhbmdoYWkAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwAAAAAAAAAAAAAAAMWT3UvHMTsYu24Ctpo0VEBJMjAFAAAAAAAAAAAAAAAA/anisSZMOzKw+zkg6igtJGUL1dgAAAAAAAAAAAAAAAA9xhDq9lJwsvDJ2GxSYpYSIyfpwQ=="}} pid=1 plugin=DOMAIN[pente/16d4203e-9735-4617-8415-634ba4a3b8f8]
[2024-11-13T22:18:40.433Z] ERROR Stream for plugin failed: rpc error: code = Canceled desc = context canceled plugin=DOMAIN[pente/16d4203e-9735-4617-8415-634ba4a3b8f8]
[2024-11-13T22:18:40.433Z]  INFO Plugin stopped id:"16d4203e-9735-4617-8415-634ba4a3b8f8" name:"pente" [16d4203e-9735-4617-8415-634ba4a3b8f8] (type=DOMAIN) pid=1
[2024-11-13T22:18:40.433Z] DEBUG domain handler ending pid=1 plugin=DOMAIN[pente/16d4203e-9735-4617-8415-634ba4a3b8f8]
22:18:40.459 [grpc-default-executor-2] ERROR io.kaleido.paladin.toolkit.PluginInstance - Plugin loader stream error
io.grpc.StatusRuntimeException: CANCELLED: Failed to stream message
    at io.grpc.Status.asRuntimeException(Status.java:533) ~[paladin.jar:?]
    at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:481) ~[paladin.jar:?]
    at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:564) ~[paladin.jar:?]
    at io.grpc.internal.ClientCallImpl.access$100(ClientCallImpl.java:72) ~[paladin.jar:?]
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:729) ~[paladin.jar:?]
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:710) ~[paladin.jar:?]
    at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[paladin.jar:?]
    at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ~[paladin.jar:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
    at java.lang.Thread.run(Unknown Source) [?:?]
Caused by: io.grpc.StatusRuntimeException: INTERNAL: Failed to frame message
    at io.grpc.Status.asRuntimeException(Status.java:525) ~[paladin.jar:?]
    at io.grpc.internal.MessageFramer.writePayload(MessageFramer.java:157) ~[paladin.jar:?]
    at io.grpc.internal.AbstractStream.writeMessage(AbstractStream.java:66) ~[paladin.jar:?]
    at io.grpc.internal.ForwardingClientStream.writeMessage(ForwardingClientStream.java:37) ~[paladin.jar:?]
    at io.grpc.internal.DelayedStream.writeMessage(DelayedStream.java:278) ~[paladin.jar:?]
    at io.grpc.internal.RetriableStream.sendMessage(RetriableStream.java:575) ~[paladin.jar:?]
    at io.grpc.internal.ClientCallImpl.sendMessageInternal(ClientCallImpl.java:522) ~[paladin.jar:?]
    at io.grpc.internal.ClientCallImpl.sendMessage(ClientCallImpl.java:510) ~[paladin.jar:?]
    at io.grpc.stub.ClientCalls$CallToStreamObserverAdapter.onNext(ClientCalls.java:368) ~[paladin.jar:?]
    at io.kaleido.paladin.toolkit.PluginInstance.send(PluginInstance.java:167) ~[paladin.jar:?]
    at java.util.concurrent.CompletableFuture.uniApplyNow(Unknown Source) ~[?:?]
    at java.util.concurrent.CompletableFuture.uniApplyStage(Unknown Source) ~[?:?]
    at java.util.concurrent.CompletableFuture.thenApply(Unknown Source) ~[?:?]
    at io.kaleido.paladin.toolkit.PluginInstance$StreamHandler.lambda$onNext$1(PluginInstance.java:207) ~[paladin.jar:?]
    at java.util.concurrent.CompletableFuture$AsyncRun.run(Unknown Source) ~[?:?]
    at java.util.concurrent.CompletableFuture$AsyncRun.exec(Unknown Source) ~[?:?]
    at java.util.concurrent.ForkJoinTask.doExec(Unknown Source) ~[?:?]
    at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source) ~[?:?]
    at java.util.concurrent.ForkJoinPool.scan(Unknown Source) ~[?:?]
    at java.util.concurrent.ForkJoinPool.runWorker(Unknown Source) ~[?:?]
    at java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source) ~[?:?]
Caused by: java.nio.BufferOverflowException
    at java.nio.Buffer.nextPutIndex(Unknown Source) ~[?:?]
    at java.nio.HeapByteBuffer.putInt(Unknown Source) ~[?:?]
    at io.grpc.internal.MessageFramer.writeKnownLengthUncompressed(MessageFramer.java:222) ~[paladin.jar:?]
    at io.grpc.internal.MessageFramer.writeUncompressed(MessageFramer.java:172) ~[paladin.jar:?]
    at io.grpc.internal.MessageFramer.writePayload(MessageFramer.java:143) ~[paladin.jar:?]
    at io.grpc.internal.AbstractStream.writeMessage(AbstractStream.java:66) ~[paladin.jar:?]
    at io.grpc.internal.ForwardingClientStream.writeMessage(ForwardingClientStream.java:37) ~[paladin.jar:?]
    at io.grpc.internal.DelayedStream.writeMessage(DelayedStream.java:278) ~[paladin.jar:?]
    at io.grpc.internal.RetriableStream.sendMessage(RetriableStream.java:575) ~[paladin.jar:?]
    at io.grpc.internal.ClientCallImpl.sendMessageInternal(ClientCallImpl.java:522) ~[paladin.jar:?]
    at io.grpc.internal.ClientCallImpl.sendMessage(ClientCallImpl.java:510) ~[paladin.jar:?]
    at io.grpc.stub.ClientCalls$CallToStreamObserverAdapter.onNext(ClientCalls.java:368) ~[paladin.jar:?]
    at io.kaleido.paladin.toolkit.PluginInstance.send(PluginInstance.java:167) ~[paladin.jar:?]
    at java.util.concurrent.CompletableFuture.uniApplyNow(Unknown Source) ~[?:?]
    at java.util.concurrent.CompletableFuture.uniApplyStage(Unknown Source) ~[?:?]
    at java.util.concurrent.CompletableFuture.thenApply(Unknown Source) ~[?:?]
    at io.kaleido.paladin.toolkit.PluginInstance$StreamHandler.lambda$onNext$1(PluginInstance.java:207) ~[paladin.jar:?]
    at java.util.concurrent.CompletableFuture$AsyncRun.run(Unknown Source) ~[?:?]
    at java.util.concurrent.CompletableFuture$AsyncRun.exec(Unknown Source) ~[?:?]
    at java.util.concurrent.ForkJoinTask.doExec(Unknown Source) ~[?:?]
    at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source) ~[?:?]
    at java.util.concurrent.ForkJoinPool.scan(Unknown Source) ~[?:?]
    at java.util.concurrent.ForkJoinPool.runWorker(Unknown Source) ~[?:?]
    at java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source) ~[?:?]
[2024-11-13T22:18:40.728Z] DEBUG Domain pente configured. Config: {} pid=1
[2024-11-13T22:18:40.729Z]  INFO 66a99c50-266f-4448-a69f-55a47a7112ee: ==> *prototk.DomainMessage_ConfigureDomain pid=1 plugin=DOMAIN[pente/16d4203e-9735-4617-8415-634ba4a3b8f8]

Seems to happen more on dynamic-coorinator-selection branch possibly because there is a lot more concurrent communication across the plugin interfaces with the various delegation / remote assembly and endorsement messages.

Still investigating to hone in on the root cause but I do find that if I add a mutex lock in (dp *domainHandler) RequestToPlugin and (ph *pluginHandler[M]) RequestReply I never see this issue. So it makes me think there is potentially some thread safety issue on the go/java plugin interface. However, still haven't put my finger on exactly what. Raising the issue for now to capture information as I investigate and in case anyone else sees similar symptoms.

I cannot reproduce this in component test env even with even higher levels of concurrency.