Closed onelapahead closed 2 weeks ago
Then, to avoid the constantly restarting due to changing config / deploying smart contracts issue, we introduce a bad hack via preStop that tells Paladin to sleep for 15s before shutting down (but the readiness probe continues to pass, allowing traffic to keep being received) which will allow it to deploy more contracts, invoke transactions, and "queue up" other config updates to hopefully help with the delays we've seen.
This isn''t working as desired, as the STS goes into a non-ready state, causing the SCDs/TIs to wait again. Instead, I'll build an inflightChange map to track when a hash has changed, but delay actually reconciling for a few seconds.
This allows us to detect config changes based on the checksum, delay updating the STS until a change interval of 30s has passed, and then release the changes lock allowing the STS to update, which helps initial SCDs -> TIs -> Domains/Registries all succeed.
Note - this is far from perfect. This does not "lock" the actual change detected, it just guarantees it won't apply any changes but every 30s. Worse case - a final change or two are queued up right after a change is applied via a pod restart, and those then cause a subsequent 30s restart.
But again an overall start up time of 1-3m compared to the previous 5-20m is a vast improvement. This become significantly less complex and faster once dynamic reload is implemented, and therefore feels like an acceptable workaround for the time being, but will defer to @dwertent @peterbroadhurst .
Testing with the branch, I saw much reduced crashloopfallback status in the paladin pods (where before the fix I saw all 3 paladin pods going through the crashloop before things get resolved/reconciled). even when I do, things get resolved within a couple of seconds, and the entire bootstrapping process was complete in under 60s for me on an M1 MacBook Pro.
definitely much more orderly startup compared to before. still seeing 1 restart in one of the paladin pods, with the following log. pasting here for potential future investigation, but definitely not blocking the merge of this PR.
$ k logs --previous paladin-node2-0 -c paladin
[2024-11-07T21:15:15.293Z] DEBUG Created REST client to http://besu-node2.default.svc.cluster.local:8545
[2024-11-07T21:15:15.293Z] WARN REMOVE: DSN='postgres://postgres:Z1XmY-ABP7PPRc44P9Hnsg7AYeFu4w2dio38JYjEpF0=@localhost:5432/postgres?sslmode=disable' pid=1
[2024-11-07T21:15:15.303Z] INFO Running migrations in: file:///app/db/migrations/postgres pid=1
[2024-11-07T21:15:15.362Z] INFO Migrations now at: v=12 dirty=false pid=1
[2024-11-07T21:15:15.362Z] INFO From block: 0 pid=1
[2024-11-07T21:15:15.365Z] INFO JSON/RPC (HTTP) server listening on [::]:8548 pid=1
[2024-11-07T21:15:15.365Z] DEBUG JSON/RPC (HTTP) server timeouts: read=0s write=0s request=10m0s pid=1
[2024-11-07T21:15:15.365Z] INFO WebSocket server readBufferSize=65536 writeBufferSize=65536 pid=1
[2024-11-07T21:15:15.365Z] INFO JSON/RPC (WebSocket) server listening on [::]:8549 pid=1
[2024-11-07T21:15:15.365Z] DEBUG JSON/RPC (WebSocket) server timeouts: read=0s write=0s request=10m0s pid=1
[2024-11-07T21:15:15.366Z] INFO Domains configured: [] pid=1
[2024-11-07T21:15:15.366Z] DEBUG Creating new public transaction manager pid=1
[2024-11-07T21:15:15.366Z] DEBUG Gas price cache size: 100 pid=1
[2024-11-07T21:15:15.366Z] DEBUG Enterprise transaction handler created pid=1
[2024-11-07T21:15:15.366Z] DEBUG Resolving key seed pid=1
[2024-11-07T21:15:15.367Z] DEBUG Initializing public transaction manager pid=1 role=public_tx_mgr
[2024-11-07T21:15:15.367Z] DEBUG Balance manager calcMethod setting: max pid=1 role=public_tx_mgr
[2024-11-07T21:15:15.367Z] DEBUG Initialized public transaction manager pid=1 role=public_tx_mgr
[2024-11-07T21:15:15.367Z] INFO stateDistributer:Start pid=1
[2024-11-07T21:15:15.367Z] DEBUG Starting 10 workers for writer 2hNRr2Q9 pid=1
[2024-11-07T21:15:15.367Z] DEBUG Starting 10 workers for writer ZYLU_oUM pid=1
[2024-11-07T21:15:15.367Z] INFO preparedTransactionDistributer:Start pid=1
[2024-11-07T21:15:15.367Z] DEBUG Starting 10 workers for writer 5dpjWCHy pid=1
[2024-11-07T21:15:15.367Z] DEBUG Starting 10 workers for writer oCh8XLfz pid=1
[2024-11-07T21:15:15.367Z] DEBUG RPC[000000001] --> eth_chainId
[2024-11-07T21:15:15.367Z] DEBUG ==> POST http://besu-node2.default.svc.cluster.local:8545 breq=tbs99MmT
[2024-11-07T21:15:15.367Z] INFO stateDistributer:Loop starting loop pid=1
[2024-11-07T21:15:15.367Z] DEBUG stateDistributer:Loop waiting for next event pid=1
[2024-11-07T21:15:15.368Z] INFO preparedTransactionDistributer:Loop starting loop pid=1
[2024-11-07T21:15:15.368Z] DEBUG preparedTransactionDistributer:Loop waiting for next event pid=1
[2024-11-07T21:15:15.369Z] INFO stateDistributer loaded 0 state distributions on startup (page=1) pid=1
[2024-11-07T21:15:15.369Z] INFO stateDistributer finished startup recovery after dispatching 0 distributions pid=1
[2024-11-07T21:15:15.370Z] ERROR RPC[000000001] <-- ERROR: FF22012: Backend RPC request failed: Post "http://besu-node2.default.svc.cluster.local:8545": dial tcp 10.96.240.232:8545: connect: connection refused
[2024-11-07T21:15:15.370Z] WARN eth_chainId failed, retrying: FF22012: Backend RPC request failed: Post "http://besu-node2.default.svc.cluster.local:8545": dial tcp 10.96.240.232:8545: connect: connection refused pid=1
[2024-11-07T21:15:15.371Z] INFO preparedTransactionDistributer loaded 0 prepared transaction distributions on startup (page=1) pid=1
[2024-11-07T21:15:15.371Z] INFO preparedTransactionDistributer finished startup recovery after dispatching 0 distributions pid=1
21:15:15.679 [grpc-default-executor-0] ERROR io.kaleido.paladin.loader.PluginLoader - Plugin loader stream error
io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
at io.grpc.Status.asRuntimeException(Status.java:533) ~[paladin.jar:?]
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:481) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:564) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl.access$100(ClientCallImpl.java:72) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:729) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:710) ~[paladin.jar:?]
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[paladin.jar:?]
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ~[paladin.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedSocketException: No such file or directory: /tmp/p.1.sock
Caused by: java.net.SocketException: No such file or directory
at sun.nio.ch.UnixDomainSockets.connect0(Native Method) ~[?:?]
at sun.nio.ch.UnixDomainSockets.connect(Unknown Source) ~[?:?]
at sun.nio.ch.UnixDomainSockets.connect(Unknown Source) ~[?:?]
at sun.nio.ch.SocketChannelImpl.connect(Unknown Source) ~[?:?]
at io.netty.util.internal.SocketUtils$3.run(SocketUtils.java:91) ~[paladin.jar:?]
at io.netty.util.internal.SocketUtils$3.run(SocketUtils.java:88) ~[paladin.jar:?]
at java.security.AccessController.doPrivileged(Unknown Source) ~[?:?]
at io.netty.util.internal.SocketUtils.connect(SocketUtils.java:88) ~[paladin.jar:?]
at io.netty.channel.socket.nio.NioDomainSocketChannel.doConnect(NioDomainSocketChannel.java:320) ~[paladin.jar:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:250) ~[paladin.jar:?]
at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1289) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:655) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:634) ~[paladin.jar:?]
at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:54) ~[paladin.jar:?]
at io.grpc.netty.WriteBufferingAndExceptionHandler.connect(WriteBufferingAndExceptionHandler.java:157) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:657) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.access$1000(AbstractChannelHandlerContext.java:61) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext$9.run(AbstractChannelHandlerContext.java:639) ~[paladin.jar:?]
at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) ~[paladin.jar:?]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) ~[paladin.jar:?]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[paladin.jar:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) ~[paladin.jar:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[paladin.jar:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[paladin.jar:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[paladin.jar:?]
... 1 more
21:15:16.187 [grpc-default-executor-1] ERROR io.kaleido.paladin.loader.PluginLoader - Plugin loader stream error
io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
at io.grpc.Status.asRuntimeException(Status.java:533) ~[paladin.jar:?]
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:481) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:564) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl.access$100(ClientCallImpl.java:72) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:729) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:710) ~[paladin.jar:?]
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[paladin.jar:?]
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ~[paladin.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedSocketException: No such file or directory: /tmp/p.1.sock
Caused by: java.net.SocketException: No such file or directory
at sun.nio.ch.UnixDomainSockets.connect0(Native Method) ~[?:?]
at sun.nio.ch.UnixDomainSockets.connect(Unknown Source) ~[?:?]
at sun.nio.ch.UnixDomainSockets.connect(Unknown Source) ~[?:?]
at sun.nio.ch.SocketChannelImpl.connect(Unknown Source) ~[?:?]
at io.netty.util.internal.SocketUtils$3.run(SocketUtils.java:91) ~[paladin.jar:?]
at io.netty.util.internal.SocketUtils$3.run(SocketUtils.java:88) ~[paladin.jar:?]
at java.security.AccessController.doPrivileged(Unknown Source) ~[?:?]
at io.netty.util.internal.SocketUtils.connect(SocketUtils.java:88) ~[paladin.jar:?]
at io.netty.channel.socket.nio.NioDomainSocketChannel.doConnect(NioDomainSocketChannel.java:320) ~[paladin.jar:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:250) ~[paladin.jar:?]
at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1289) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:655) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:634) ~[paladin.jar:?]
at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:54) ~[paladin.jar:?]
at io.grpc.netty.WriteBufferingAndExceptionHandler.connect(WriteBufferingAndExceptionHandler.java:157) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:657) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.access$1000(AbstractChannelHandlerContext.java:61) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext$9.run(AbstractChannelHandlerContext.java:639) ~[paladin.jar:?]
at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) ~[paladin.jar:?]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) ~[paladin.jar:?]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[paladin.jar:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) ~[paladin.jar:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[paladin.jar:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[paladin.jar:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[paladin.jar:?]
... 1 more
[2024-11-07T21:15:16.371Z] DEBUG RPC[000000002] --> eth_chainId
[2024-11-07T21:15:16.372Z] DEBUG ==> POST http://besu-node2.default.svc.cluster.local:8545 breq=HX7Rd5jP
[2024-11-07T21:15:16.373Z] ERROR RPC[000000002] <-- ERROR: FF22012: Backend RPC request failed: Post "http://besu-node2.default.svc.cluster.local:8545": dial tcp 10.96.240.232:8545: connect: connection refused
[2024-11-07T21:15:16.374Z] WARN eth_chainId failed, retrying: FF22012: Backend RPC request failed: Post "http://besu-node2.default.svc.cluster.local:8545": dial tcp 10.96.240.232:8545: connect: connection refused pid=1
21:15:17.189 [grpc-default-executor-0] ERROR io.kaleido.paladin.loader.PluginLoader - Plugin loader stream error
io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
at io.grpc.Status.asRuntimeException(Status.java:533) ~[paladin.jar:?]
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:481) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:564) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl.access$100(ClientCallImpl.java:72) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:729) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:710) ~[paladin.jar:?]
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[paladin.jar:?]
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ~[paladin.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedSocketException: No such file or directory: /tmp/p.1.sock
Caused by: java.net.SocketException: No such file or directory
at sun.nio.ch.UnixDomainSockets.connect0(Native Method) ~[?:?]
at sun.nio.ch.UnixDomainSockets.connect(Unknown Source) ~[?:?]
at sun.nio.ch.UnixDomainSockets.connect(Unknown Source) ~[?:?]
at sun.nio.ch.SocketChannelImpl.connect(Unknown Source) ~[?:?]
at io.netty.util.internal.SocketUtils$3.run(SocketUtils.java:91) ~[paladin.jar:?]
at io.netty.util.internal.SocketUtils$3.run(SocketUtils.java:88) ~[paladin.jar:?]
at java.security.AccessController.doPrivileged(Unknown Source) ~[?:?]
at io.netty.util.internal.SocketUtils.connect(SocketUtils.java:88) ~[paladin.jar:?]
at io.netty.channel.socket.nio.NioDomainSocketChannel.doConnect(NioDomainSocketChannel.java:320) ~[paladin.jar:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:250) ~[paladin.jar:?]
at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1289) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:655) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:634) ~[paladin.jar:?]
at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:54) ~[paladin.jar:?]
at io.grpc.netty.WriteBufferingAndExceptionHandler.connect(WriteBufferingAndExceptionHandler.java:157) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:657) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.access$1000(AbstractChannelHandlerContext.java:61) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext$9.run(AbstractChannelHandlerContext.java:639) ~[paladin.jar:?]
at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) ~[paladin.jar:?]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) ~[paladin.jar:?]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[paladin.jar:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) ~[paladin.jar:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[paladin.jar:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[paladin.jar:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[paladin.jar:?]
... 1 more
[2024-11-07T21:15:18.375Z] DEBUG RPC[000000003] --> eth_chainId
[2024-11-07T21:15:18.375Z] DEBUG ==> POST http://besu-node2.default.svc.cluster.local:8545 breq=4HUuxC87
[2024-11-07T21:15:18.378Z] ERROR RPC[000000003] <-- ERROR: FF22012: Backend RPC request failed: Post "http://besu-node2.default.svc.cluster.local:8545": dial tcp 10.96.240.232:8545: connect: connection refused
[2024-11-07T21:15:18.378Z] WARN eth_chainId failed, retrying: FF22012: Backend RPC request failed: Post "http://besu-node2.default.svc.cluster.local:8545": dial tcp 10.96.240.232:8545: connect: connection refused pid=1
21:15:19.192 [grpc-default-executor-1] ERROR io.kaleido.paladin.loader.PluginLoader - Plugin loader stream error
io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
at io.grpc.Status.asRuntimeException(Status.java:533) ~[paladin.jar:?]
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:481) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:564) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl.access$100(ClientCallImpl.java:72) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:729) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:710) ~[paladin.jar:?]
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[paladin.jar:?]
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ~[paladin.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedSocketException: No such file or directory: /tmp/p.1.sock
Caused by: java.net.SocketException: No such file or directory
at sun.nio.ch.UnixDomainSockets.connect0(Native Method) ~[?:?]
at sun.nio.ch.UnixDomainSockets.connect(Unknown Source) ~[?:?]
at sun.nio.ch.UnixDomainSockets.connect(Unknown Source) ~[?:?]
at sun.nio.ch.SocketChannelImpl.connect(Unknown Source) ~[?:?]
at io.netty.util.internal.SocketUtils$3.run(SocketUtils.java:91) ~[paladin.jar:?]
at io.netty.util.internal.SocketUtils$3.run(SocketUtils.java:88) ~[paladin.jar:?]
at java.security.AccessController.doPrivileged(Unknown Source) ~[?:?]
at io.netty.util.internal.SocketUtils.connect(SocketUtils.java:88) ~[paladin.jar:?]
at io.netty.channel.socket.nio.NioDomainSocketChannel.doConnect(NioDomainSocketChannel.java:320) ~[paladin.jar:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:250) ~[paladin.jar:?]
at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1289) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:655) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:634) ~[paladin.jar:?]
at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:54) ~[paladin.jar:?]
at io.grpc.netty.WriteBufferingAndExceptionHandler.connect(WriteBufferingAndExceptionHandler.java:157) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:657) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.access$1000(AbstractChannelHandlerContext.java:61) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext$9.run(AbstractChannelHandlerContext.java:639) ~[paladin.jar:?]
at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) ~[paladin.jar:?]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) ~[paladin.jar:?]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[paladin.jar:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) ~[paladin.jar:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[paladin.jar:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[paladin.jar:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[paladin.jar:?]
... 1 more
[2024-11-07T21:15:22.380Z] DEBUG RPC[000000004] --> eth_chainId
[2024-11-07T21:15:22.380Z] DEBUG ==> POST http://besu-node2.default.svc.cluster.local:8545 breq=WaSRx9oH
[2024-11-07T21:15:22.381Z] ERROR RPC[000000004] <-- ERROR: FF22012: Backend RPC request failed: Post "http://besu-node2.default.svc.cluster.local:8545": dial tcp 10.96.240.232:8545: connect: connection refused
[2024-11-07T21:15:22.381Z] WARN eth_chainId failed, retrying: FF22012: Backend RPC request failed: Post "http://besu-node2.default.svc.cluster.local:8545": dial tcp 10.96.240.232:8545: connect: connection refused pid=1
21:15:23.198 [grpc-default-executor-0] ERROR io.kaleido.paladin.loader.PluginLoader - Plugin loader stream error
io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
at io.grpc.Status.asRuntimeException(Status.java:533) ~[paladin.jar:?]
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:481) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:564) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl.access$100(ClientCallImpl.java:72) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:729) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:710) ~[paladin.jar:?]
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[paladin.jar:?]
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ~[paladin.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedSocketException: No such file or directory: /tmp/p.1.sock
Caused by: java.net.SocketException: No such file or directory
at sun.nio.ch.UnixDomainSockets.connect0(Native Method) ~[?:?]
at sun.nio.ch.UnixDomainSockets.connect(Unknown Source) ~[?:?]
at sun.nio.ch.UnixDomainSockets.connect(Unknown Source) ~[?:?]
at sun.nio.ch.SocketChannelImpl.connect(Unknown Source) ~[?:?]
at io.netty.util.internal.SocketUtils$3.run(SocketUtils.java:91) ~[paladin.jar:?]
at io.netty.util.internal.SocketUtils$3.run(SocketUtils.java:88) ~[paladin.jar:?]
at java.security.AccessController.doPrivileged(Unknown Source) ~[?:?]
at io.netty.util.internal.SocketUtils.connect(SocketUtils.java:88) ~[paladin.jar:?]
at io.netty.channel.socket.nio.NioDomainSocketChannel.doConnect(NioDomainSocketChannel.java:320) ~[paladin.jar:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:250) ~[paladin.jar:?]
at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1289) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:655) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:634) ~[paladin.jar:?]
at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:54) ~[paladin.jar:?]
at io.grpc.netty.WriteBufferingAndExceptionHandler.connect(WriteBufferingAndExceptionHandler.java:157) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:657) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.access$1000(AbstractChannelHandlerContext.java:61) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext$9.run(AbstractChannelHandlerContext.java:639) ~[paladin.jar:?]
at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) ~[paladin.jar:?]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) ~[paladin.jar:?]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[paladin.jar:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) ~[paladin.jar:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[paladin.jar:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[paladin.jar:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[paladin.jar:?]
... 1 more
[2024-11-07T21:15:30.382Z] DEBUG RPC[000000005] --> eth_chainId
[2024-11-07T21:15:30.382Z] DEBUG ==> POST http://besu-node2.default.svc.cluster.local:8545 breq=8JHvD8Ks
[2024-11-07T21:15:30.385Z] ERROR RPC[000000005] <-- ERROR: FF22012: Backend RPC request failed: Post "http://besu-node2.default.svc.cluster.local:8545": dial tcp 10.96.240.232:8545: connect: connection refused
[2024-11-07T21:15:30.385Z] WARN eth_chainId failed, retrying: FF22012: Backend RPC request failed: Post "http://besu-node2.default.svc.cluster.local:8545": dial tcp 10.96.240.232:8545: connect: connection refused pid=1
21:15:31.201 [grpc-default-executor-1] ERROR io.kaleido.paladin.loader.PluginLoader - Plugin loader stream error
io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
at io.grpc.Status.asRuntimeException(Status.java:533) ~[paladin.jar:?]
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:481) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:564) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl.access$100(ClientCallImpl.java:72) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:729) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:710) ~[paladin.jar:?]
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[paladin.jar:?]
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ~[paladin.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedSocketException: No such file or directory: /tmp/p.1.sock
Caused by: java.net.SocketException: No such file or directory
at sun.nio.ch.UnixDomainSockets.connect0(Native Method) ~[?:?]
at sun.nio.ch.UnixDomainSockets.connect(Unknown Source) ~[?:?]
at sun.nio.ch.UnixDomainSockets.connect(Unknown Source) ~[?:?]
at sun.nio.ch.SocketChannelImpl.connect(Unknown Source) ~[?:?]
at io.netty.util.internal.SocketUtils$3.run(SocketUtils.java:91) ~[paladin.jar:?]
at io.netty.util.internal.SocketUtils$3.run(SocketUtils.java:88) ~[paladin.jar:?]
at java.security.AccessController.doPrivileged(Unknown Source) ~[?:?]
at io.netty.util.internal.SocketUtils.connect(SocketUtils.java:88) ~[paladin.jar:?]
at io.netty.channel.socket.nio.NioDomainSocketChannel.doConnect(NioDomainSocketChannel.java:320) ~[paladin.jar:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:250) ~[paladin.jar:?]
at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1289) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:655) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:634) ~[paladin.jar:?]
at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:54) ~[paladin.jar:?]
at io.grpc.netty.WriteBufferingAndExceptionHandler.connect(WriteBufferingAndExceptionHandler.java:157) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:657) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.access$1000(AbstractChannelHandlerContext.java:61) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext$9.run(AbstractChannelHandlerContext.java:639) ~[paladin.jar:?]
at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) ~[paladin.jar:?]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) ~[paladin.jar:?]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[paladin.jar:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) ~[paladin.jar:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[paladin.jar:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[paladin.jar:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[paladin.jar:?]
... 1 more
21:15:46.209 [grpc-default-executor-0] ERROR io.kaleido.paladin.loader.PluginLoader - Plugin loader stream error
io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
at io.grpc.Status.asRuntimeException(Status.java:533) ~[paladin.jar:?]
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:481) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:564) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl.access$100(ClientCallImpl.java:72) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:729) ~[paladin.jar:?]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:710) ~[paladin.jar:?]
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[paladin.jar:?]
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ~[paladin.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedSocketException: No such file or directory: /tmp/p.1.sock
Caused by: java.net.SocketException: No such file or directory
at sun.nio.ch.UnixDomainSockets.connect0(Native Method) ~[?:?]
at sun.nio.ch.UnixDomainSockets.connect(Unknown Source) ~[?:?]
at sun.nio.ch.UnixDomainSockets.connect(Unknown Source) ~[?:?]
at sun.nio.ch.SocketChannelImpl.connect(Unknown Source) ~[?:?]
at io.netty.util.internal.SocketUtils$3.run(SocketUtils.java:91) ~[paladin.jar:?]
at io.netty.util.internal.SocketUtils$3.run(SocketUtils.java:88) ~[paladin.jar:?]
at java.security.AccessController.doPrivileged(Unknown Source) ~[?:?]
at io.netty.util.internal.SocketUtils.connect(SocketUtils.java:88) ~[paladin.jar:?]
at io.netty.channel.socket.nio.NioDomainSocketChannel.doConnect(NioDomainSocketChannel.java:320) ~[paladin.jar:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:250) ~[paladin.jar:?]
at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1289) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:655) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:634) ~[paladin.jar:?]
at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:54) ~[paladin.jar:?]
at io.grpc.netty.WriteBufferingAndExceptionHandler.connect(WriteBufferingAndExceptionHandler.java:157) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:657) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.access$1000(AbstractChannelHandlerContext.java:61) ~[paladin.jar:?]
at io.netty.channel.AbstractChannelHandlerContext$9.run(AbstractChannelHandlerContext.java:639) ~[paladin.jar:?]
at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) ~[paladin.jar:?]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) ~[paladin.jar:?]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[paladin.jar:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) ~[paladin.jar:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[paladin.jar:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[paladin.jar:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[paladin.jar:?]
... 1 more
[2024-11-07T21:15:46.388Z] ERROR eth_chainId failed: FF22012: Backend RPC request failed: Post "http://besu-node2.default.svc.cluster.local:8545": dial tcp 10.96.240.232:8545: connect: connection refused pid=1
[2024-11-07T21:15:46.389Z] ERROR PD010003: Error starting ethereum client: PD011508: Failed to query chain ID: FF22012: Backend RPC request failed: Post "http://besu-node2.default.svc.cluster.local:8545": dial tcp 10.96.240.232:8545: connect: connection refused pid=1
[2024-11-07T21:15:46.389Z] INFO Stopping pid=1
[2024-11-07T21:15:46.389Z] INFO Stopping database pid=1
[2024-11-07T21:15:46.398Z] INFO DB closed (err=<nil>)
[2024-11-07T21:15:46.405Z] DEBUG Stopped database pid=1
[2024-11-07T21:15:46.405Z] DEBUG Stopped pid=1
21:15:46.476 [main] ERROR io.kaleido.paladin.Main - exiting with error: 1
Description
Follow up from #402. Attempts to close #397.
Initially on Getting Started, Paladin crashes some due to the Postgres sidecar not being healthy by the time the automigrations run. We order the containers so Postgres is the first container, and use a
postStart
hook that must complete before starting Paladin.Probes are added so the Operator isn't notified Paladin is ready until its RPC port is live. This should help the deploy/invoke controllers so that they don't run unless it can actually receive JSONRPC traffic.
Then, to avoid the constantly restarting due to changing config / deploying smart contracts issue, we introduce a temporary solution of
inflight
changes map lock. This allows us to detect config changes based on the checksum, delay updating the STS until a change interval of 30s has passed, and then release the changes lock allowing the STS to update, which helps initial SCDs -> TIs -> Domains/Registries all succeed. This can be removed once #398 is completed.Other Changes
setupChainId
in the RPC client of Paladin to avoid any startup crashes due to Besu coming up in parallel (or being down for production scenarios)