Open amitonlentra opened 7 months ago
This has been a blocking issue for us since last week. On the other side it sounds like a configuration issue too since start up is failing. @npepinpe @Zelldon @deepthidevaki @oleschoenburg - could one of you please spare few mins to provide insights on whether this is a real issue or something basic is missing in the helm file?
Thanks.
Did you update directly from 8.2.x to 8.4.x, without first updating to 8.3.x? As stated in the docs, you can skip patch versions, but you cannot skip minor versions during an update, as there may be interim migrations required.
@npepinpe - thanks for replying. This error even comes locally. On the local setup, we directly installed 8.4.0/8.4.5 through helm. While troubleshooting, I realised that this is logged as a warning. Is it something to worry about or a temporary issue? Single workflow execution works fine. We haven't got a chance to do a load test yet.
Is there any solution for this?
@Ruivalim - what kind of a solution are you looking for? Are you seeing any side-effects of the errors logged as warning?
I'm doing a fresh install of v8.5.4
I also get same error on a clean install. I also disable the identity/console/optimize/tasklist to make a minimal install.
it says io.atomix.cluster.messaging.MessagingException$RemoteHandlerFailure: Remote handler failed to handle message, cause: Failed to handle message, host workflow-zeebe-0.workflow-zeebe.camunda-workflow.svc:26502 is not a known cluster member
while the service exist on workflow-zeebe.camunda-workflow:26502 TCP and pod workflow-zeebe-0 also exists
while I can
curl workflow-zeebe-0.workflow-zeebe.camunda-workflow.svc:26502
curl: (52) Empty reply from server
Sometimes got the log On zeebee
Partition-1 failed, marking it as unhealthy: Partition-1{status=UNHEALTHY, issue=HealthIssue[message=null, throwable=null, cause=ZeebePartition-1{status=UNHEALTHY, issue=HealthIssue[message=Services not installed, throwable=null, cause=null]}]}
On zeebee-gateway
2024-07-02 11:03:12.521 [] [netty-messaging-event-epoll-client-0] [] WARN
io.atomix.cluster.messaging.impl.NettyMessagingService - Unexpected error while handling message stream-recreate from workflow-zeebe-0.workflow-zeebe.camunda-workflow.svc:26502
io.atomix.cluster.messaging.MessagingException$NoSuchMemberException: Failed to handle message, host workflow-zeebe-0.workflow-zeebe.camunda-workflow.svc:26502 is not a known cluster member
Seem to work after add these to zeebe-gateway's service, where they doesn't exist.
- name: internal
port: 26502
protocol: TCP
targetPort: 26502
- name: command
port: 26501
protocol: TCP
targetPort: 26501
2024-07-02 09:43:46.298 [Broker-0] [zb-actors-1] [HealthCheckService] INFO
io.camunda.zeebe.broker.system - Partition-1 recovered, marking it as healthy
but operate still have error log
2024-07-02 10:38:24.412 WARN 7 --- [-worker-ELG-1-2] i.c.z.c.i.ZeebeCallCredentials : The request's security level does not guarantee that the credentials will be confidential.
Error occurred when requesting partition ids from Zeebe client: null
it worked after disable auth completely:
global:
identity:
auth:
enabled: false
Describe the bug
We are upgrading zeebe from 8.2.12 to the 8.4.0 (even 8.4.5 failed) but zeebe brokers errors out on start up with an error
java.util.concurrent.CompletionException: io.atomix.cluster.messaging.MessagingException$RemoteHandlerFailure: Remote handler failed to handle message, cause: Failed to handle message, host dev-zeebe-0.dev-zeebe.default.svc:26502 is not a known cluster member
The helm chart version for 8.4.0 was 9.0.2. We even tried starting up 8.4.5 and got the same error. We also tried with the latest (8.5.0-alpha2) locally on my laptop with the below helm command and saw the same issue in the logs for zeebe broker 0
The installation fails on AWS setup with ec2 instances and even locally on a laptop.
To Reproduce
Install zeebe with the above command or with 8.4.0 (command below) and see logs for broker 0. In our setup we are trying to install 9 brokers with the below values.yaml file.
Log/Stacktrace
The below stacktrace indicates that broker 0 failed to connect with broker 8.
Full Stacktrace
``` 2024-03-28 08:36:56.153 [] [atomix-cluster-heartbeat-sender] [] INFO io.atomix.cluster.protocol.swim - 0 - Member added Member{id=2, address=dev-zeebe-2.dev-zeebe.default.svc:26502, properties={}} 2024-03-28 08:36:56.184 [Broker-0] [zb-actors-1] [] WARN io.camunda.zeebe.topology.gossip.ClusterTopologyGossiper - Failed to sync with 2 java.util.concurrent.CompletionException: io.atomix.cluster.messaging.MessagingException$RemoteHandlerFailure: Remote handler failed to handle message, cause: Failed to handle message, host dev-zeebe-0.dev-zeebe.default.svc:26502 is not a known cluster member at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) ~[?:?] at java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source) ~[?:?] at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown Source) ~[?:?] at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?] at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) ~[?:?] at io.atomix.cluster.messaging.impl.NettyMessagingService.lambda$executeOnPooledConnection$25(NettyMessagingService.java:626) ~[zeebe-atomix-cluster-8.4.0.jar:8.4.0] at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31) ~[guava-33.0.0-jre.jar:?] at io.atomix.cluster.messaging.impl.NettyMessagingService.lambda$executeOnPooledConnection$26(NettyMessagingService.java:624) ~[zeebe-atomix-cluster-8.4.0.jar:8.4.0] at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source) ~[?:?] at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source) ~[?:?] at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?] at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) ~[?:?] at io.atomix.cluster.messaging.impl.AbstractClientConnection.dispatch(AbstractClientConnection.java:49) ~[zeebe-atomix-cluster-8.4.0.jar:8.4.0] at io.atomix.cluster.messaging.impl.AbstractClientConnection.dispatch(AbstractClientConnection.java:30) ~[zeebe-atomix-cluster-8.4.0.jar:8.4.0] at io.atomix.cluster.messaging.impl.NettyMessagingService$MessageDispatcher.channelRead0(NettyMessagingService.java:1109) ~[zeebe-atomix-cluster-8.4.0.jar:8.4.0] at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) ~[netty-transport-4.1.104.Final.jar:4.1.104.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.104.Final.jar:4.1.104.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.104.Final.jar:4.1.104.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.104.Final.jar:4.1.104.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346) ~[netty-codec-4.1.104.Final.jar:4.1.104.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318) ~[netty-codec-4.1.104.Final.jar:4.1.104.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.104.Final.jar:4.1.104.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.104.Final.jar:4.1.104.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.104.Final.jar:4.1.104.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[netty-transport-4.1.104.Final.jar:4.1.104.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[netty-transport-4.1.104.Final.jar:4.1.104.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.104.Final.jar:4.1.104.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[netty-transport-4.1.104.Final.jar:4.1.104.Final] at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800) ~[netty-transport-classes-epoll-4.1.104.Final.jar:4.1.104.Final] at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:509) ~[netty-transport-classes-epoll-4.1.104.Final.jar:4.1.104.Final] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:407) ~[netty-transport-classes-epoll-4.1.104.Final.jar:4.1.104.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[netty-common-4.1.104.Final.jar:4.1.104.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.104.Final.jar:4.1.104.Final] at java.base/java.lang.Thread.run(Unknown Source) ~[?:?] Caused by: io.atomix.cluster.messaging.MessagingException$RemoteHandlerFailure: Remote handler failed to handle message, cause: Failed to handle message, host dev-zeebe-0.dev-zeebe.default.svc:26502 is not a known cluster member ... 22 more ```
Environment: