camunda-community-hub / zeebe-helm

Public Zeebe K8s HELM Charts
http://helm.zeebe.io
Apache License 2.0
13 stars 3 forks source link

RaftServer{system-partition-1}{role=FOLLOWER} - java.net.ConnectException #3

Closed gitizenme closed 3 years ago

gitizenme commented 4 years ago

Running

helm install zeebe-full zeebe/zeebe-full

The cluster fails to transition into the running state:

NAME READY STATUS RESTARTS AGE elasticsearch-master-0 0/1 Pending 0 25s elasticsearch-master-1 0/1 Pending 0 25s elasticsearch-master-2 0/1 Pending 0 25s zeebe-full-nginx-ingress-controller-6c689bb4cc-b5mlw 1/1 Running 0 26s zeebe-full-nginx-ingress-default-backend-849f468f76-gjzg8 1/1 Running 0 26s zeebe-full-operate-84c9c66d8-kj85v 1/1 Running 0 26s zeebe-full-zeebe-0 0/1 Pending 0 26s zeebe-full-zeebe-1 0/1 Running 0 25s zeebe-full-zeebe-2 0/1 Pending 0 25s

and the log for one of the zeebe-full nodes reports:

2019-11-27 19:56:30.529 [] [zb-blocking-task-runner-1-zeebe-full-zeebe-1.zeebe-full-zeebe.default.svc.cluster.local:26501] INFO io.zeebe.gateway - Version: 0.21.1 2019-11-27 19:56:30.531 [] [zb-blocking-task-runner-1-zeebe-full-zeebe-1.zeebe-full-zeebe.default.svc.cluster.local:26501] INFO io.zeebe.gateway - Starting gateway with configuration { "enable": true, "network": { "host": "0.0.0.0", "port": 26500 }, "cluster": { "contactPoint": "zeebe-full-zeebe-1.zeebe-full-zeebe.default.svc.cluster.local:26502", "maxMessageSize": "4M", "requestTimeout": "15s", "clusterName": "zeebe-cluster", "memberId": "gateway", "host": "0.0.0.0", "port": 26502 }, "threads": { "managementThreads": 1 }, "monitoring": { "enabled": false, "host": "zeebe-full-zeebe-1.zeebe-full-zeebe.default.svc.cluster.local", "port": 9600 }, "security": { "enabled": false } } 2019-11-27 19:56:31.201 [] [atomix-cluster-events] DEBUG io.zeebe.broker.clustering - Member 1 received event ClusterMembershipEvent{type=MEMBER_ADDED, subject=Member{id=1, address=zeebe-full-zeebe-1.zeebe-full-zeebe.default.svc.cluster.local:26502, properties={brokerInfo=EADJAAAAAQABAAAAA 2019-11-27 19:56:31.203 [io.zeebe.gateway.impl.broker.cluster.BrokerTopologyManagerImpl] [zeebe-full-zeebe-1.zeebe-full-zeebe.default.svc.cluster.local:26501-zb-actors-0] DEBUG io.zeebe.gateway - Received membership event: ClusterMembershipEvent{type=MEMBER_ADDED, subject=Member{id=1, addr 2019-11-27 19:56:31.212 [io.zeebe.gateway.impl.broker.cluster.BrokerTopologyManagerImpl] [zeebe-full-zeebe-1.zeebe-full-zeebe.default.svc.cluster.local:26501-zb-actors-0] INFO io.zeebe.transport.endpoint - Registering endpoint for node '1' with address 'zeebe-full-zeebe-1.zeebe-full-zeebe 2019-11-27 19:56:36.683 [] [raft-server-system-partition-1] WARN io.atomix.protocols.raft.roles.FollowerRole - RaftServer{system-partition-1}{role=FOLLOWER} - java.net.ConnectException 2019-11-27 19:56:36.684 [] [raft-server-system-partition-1] WARN io.atomix.protocols.raft.roles.FollowerRole - RaftServer{system-partition-1}{role=FOLLOWER} - java.net.ConnectException

Version info: helm version version.BuildInfo{Version:"v3.0.0", GitCommit:"e29ce2a54e96cd02ccfce88bee4f58bb6e2a28b6", GitTreeState:"clean", GoVersion:"go1.13.4"} kubectl version Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-14T04:24:29Z", GoVersion:"go1.12.13", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.8", GitCommit:"211047e9a1922595eaa3a1127ed365e9299a6c23", GitTreeState:"clean", BuildDate:"2019-10-15T12:02:12Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

Any guidance on how to fix the issue causing the exception?

salaboy commented 4 years ago

@gitizenme hi there, thanks a lot for reporting this. Can you please share more information about where are you trying to run the charts? Which cloud provider?

salaboy commented 4 years ago

@gitizenme if you can provide more information we should be able to help .

salaboy commented 4 years ago

@gitizenme does this still applies? Can you provide more information?

gitizenme commented 4 years ago

Sorry, I was out on the long holiday. Yes, the issue is still present. I'm running on macOS 10.15.1 using Docker Desktop image image

salaboy commented 4 years ago

@gitizenme I haven't had the time to check with with Kubernetes in Docker For Mac, but the same charts are working in Kubernetes KIND.. so I bet that there is a small difference somewhere that we need to tune for Docker for Mac. Can you try doing an update on the charts and try again? I've released a new version of the charts.

gitizenme commented 4 years ago

@salaboy no change, still receiving the same error.

salaboy commented 4 years ago

@gitizenme ok.. give me until tomorrow so I can try it locally to see if I can find what the problem is and create a new release. Docker for Mac was not in my immediate plans.. but since you ask.. I will give it a go

gitizenme commented 4 years ago

@salaboy Sounds good, thanks for following up so quickly. FYI - our use cases are:

salaboy commented 4 years ago

@gitizenme Are you using Helm3 right?

salaboy commented 4 years ago

@gitizenme running the same in docker for Mac with Helm2, in a default setup I am getting:

2019-12-04 09:54:13.307 [io.zeebe.gateway.impl.broker.cluster.BrokerTopologyManagerImpl] [salaboy-zeebe-0.salaboy-zeebe.default.svc.cluster.local:26501-zb-actors-0] DEBUG io.zeebe.gateway - Received membership event: ClusterMembershipEvent{type=METADATA_CHANGED, subject=Member{id=0, address=salaboy-zeebe-0.salaboy-zeebe.default.svc.cluster.local:26502, properties={brokerInfo=EADJAAAAAQAAAAAAAwAAAAMAAAADAAAAAAABCgAAAGNvbW1hbmRBcGk9AAAAc2FsYWJveS16ZWViZS0wLnNhbGFib3ktemVlYmUuZGVmYXVsdC5zdmMuY2x1c3Rlci5sb2NhbDoyNjUwMQUAAQMAAAAB}}, time=1575453251550} with BrokerInfo{nodeId=0, partitionsCount=3, clusterSize=3, replicationFactor=3, partitionRoles={3=FOLLOWER}} 
2019-12-04 09:54:13.307 [service-controller] [salaboy-zeebe-0.salaboy-zeebe.default.svc.cluster.local:26501-zb-actors-1] ERROR io.zeebe.util.actor - Actor failed in phase 'STARTED'. Continue with next job.
java.lang.OutOfMemoryError: Java heap space
    at java.nio.HeapByteBuffer.<init>(Unknown Source) ~[?:?]
    at java.nio.ByteBuffer.allocate(Unknown Source) ~[?:?]
    at io.zeebe.distributedlog.restore.log.impl.DefaultLogReplicationRequestHandler.<init>(DefaultLogReplicationRequestHandler.java:34) ~[zeebe-logstreams-0.21.1.jar:0.21.1]
    at io.zeebe.distributedlog.restore.log.impl.DefaultLogReplicationRequestHandler.<init>(DefaultLogReplicationRequestHandler.java:29) ~[zeebe-logstreams-0.21.1.jar:0.21.1]
    at io.zeebe.broker.logstreams.restore.BrokerRestoreServer.start(BrokerRestoreServer.java:62) ~[zeebe-broker-0.21.1.jar:0.21.1]
    at io.zeebe.broker.clustering.base.partitions.Partition.startRestoreServer(Partition.java:142) ~[zeebe-broker-0.21.1.jar:0.21.1]
    at io.zeebe.broker.clustering.base.partitions.Partition.start(Partition.java:116) ~[zeebe-broker-0.21.1.jar:0.21.1]
    at io.zeebe.servicecontainer.impl.ServiceController$AwaitDependenciesStartedState.onDependenciesAvailable(ServiceController.java:260) ~[zeebe-service-container-0.21.1.jar:0.21.1]
    at io.zeebe.servicecontainer.impl.ServiceController$AwaitDependenciesStartedState.accept(ServiceController.java:213) ~[zeebe-service-container-0.21.1.jar:0.21.1]
    at io.zeebe.servicecontainer.impl.ServiceController$AwaitDependenciesStartedState.accept(ServiceController.java:207) ~[zeebe-service-container-0.21.1.jar:0.21.1]
    at io.zeebe.servicecontainer.impl.ServiceController.onServiceEvent(ServiceController.java:105) ~[zeebe-service-container-0.21.1.jar:0.21.1]
    at io.zeebe.servicecontainer.impl.ServiceController$$Lambda$139/0x0000000100259040.run(Unknown Source) ~[?:?]
    at io.zeebe.util.sched.ActorJob.invoke(ActorJob.java:76) ~[zeebe-util-0.21.1.jar:0.21.1]
    at io.zeebe.util.sched.ActorJob.execute(ActorJob.java:39) [zeebe-util-0.21.1.jar:0.21.1]
    at io.zeebe.util.sched.ActorTask.execute(ActorTask.java:127) [zeebe-util-0.21.1.jar:0.21.1]
    at io.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:107) [zeebe-util-0.21.1.jar:0.21.1]
    at io.zeebe.util.sched.ActorThread.doWork(ActorThread.java:91) [zeebe-util-0.21.1.jar:0.21.1]
    at io.zeebe.util.sched.ActorThread.run(ActorThread.java:195) [zeebe-util-0.21.1.jar:0.21.1]

I will start looking into how to tune Docker for Mac to make sure that there is no resources problems

salaboy commented 4 years ago

@gitizenme after sorting out the resources problem:

salaboy-nginx-ingress-controller-844d5784d7-t7h2z       1/1     Running   1          48m
salaboy-nginx-ingress-default-backend-f66f7758b-trlf2   1/1     Running   1          48m
salaboy-operate-5d7bd95f44-gphbc                        1/1     Running   15         48m
salaboy-zeebe-0                                         1/1     Running   1          48m
salaboy-zeebe-1                                         1/1     Running   1          48m
salaboy-zeebe-2                                         1/1     Running   0          72s

The only big difference that I can think of .. is helm 3

salaboy commented 4 years ago

In order to get ElasticSearch working in docker for Mac you only need to do some tweaks to your values file as follows:

zeebe:
  elasticsearch:
    imageTag: 6.8.3
    # Permit co-located instances for solitary minikube virtual machines.
    antiAffinity: "soft"

    # Shrink default JVM heap.
    esJavaOpts: "-Xmx128m -Xms128m"

    # Allocate smaller chunks of memory per pod.
    resources:
      requests:
        cpu: "100m"
        memory: "512M"
    limits:
      cpu: "1000m"
      memory: "512M"

  # Request smaller persistent volumes.
  volumeClaimTemplate:
    accessModes: [ "ReadWriteOnce" ]
    storageClassName: "hostpath"
    resources:
      requests:
        storage: 100M

That configuration is coming from the elasticsearch chart official examples.

salaboy commented 4 years ago

Everything is running here.. @gitizenme can you please double check that you don't have a nasty java.lang.OutOfMemoryError: Java heap space in your pod logs?

NAME                                                    READY   STATUS    RESTARTS   AGE
elasticsearch-master-0                                  1/1     Running   0          9m9s
elasticsearch-master-1                                  1/1     Running   0          9m9s
elasticsearch-master-2                                  1/1     Running   0          9m9s
salaboy-nginx-ingress-controller-844d5784d7-sgfgf       1/1     Running   0          9m9s
salaboy-nginx-ingress-default-backend-f66f7758b-2vrql   1/1     Running   0          9m9s
salaboy-operate-5d7bd95f44-f245b                        1/1     Running   3          9m9s
salaboy-zeebe-0                                         1/1     Running   0          4m36s
salaboy-zeebe-1                                         1/1     Running   0          9m9s
salaboy-zeebe-2                                         1/1     Running   0          9m9s
gitizenme commented 4 years ago

I'll have time to check the value changes tomorrow @salaboy

gitizenme commented 4 years ago

In order to get ElasticSearch working in docker for Mac you only need to do some tweaks to your values file as follows:

zeebe:
  elasticsearch:
    imageTag: 6.8.3
    # Permit co-located instances for solitary minikube virtual machines.
    antiAffinity: "soft"

    # Shrink default JVM heap.
    esJavaOpts: "-Xmx128m -Xms128m"

    # Allocate smaller chunks of memory per pod.
    resources:
      requests:
        cpu: "100m"
        memory: "512M"
    limits:
      cpu: "1000m"
      memory: "512M"

  # Request smaller persistent volumes.
  volumeClaimTemplate:
    accessModes: [ "ReadWriteOnce" ]
    storageClassName: "hostpath"
    resources:
      requests:
        storage: 100M

That configuration is coming from the elasticsearch chart official examples.

Which values file?

salaboy commented 4 years ago

@gitizenme just put the content that I've listed in the comment in a file .yaml and then when you call install use the -f to send that file to the install.. as it is in the docs for KIND.

salaboy commented 4 years ago

@gitizenme did you manage to try with that values file? I am currently updating the charts and testing again.. I will appreciate feedback to see if you still find this issue.

salaboy commented 3 years ago

inactive for too long