camunda-community-hub / zeebe-full-helm

Zeebe Cluster + Operate Parent HELM Chart
Apache License 2.0
7 stars 13 forks source link

Zeebe on kubernetes: container keeps restarting #100

Closed BigeYoung closed 4 years ago

BigeYoung commented 4 years ago

Zeebe has been running on my Kubernetes for a while, and it has been working very well. Until recently, I found that a Pod kept reporting errors and restarting. I tried to use helm to delete the chart, then use kubectl to delete all pvc/pv, even enter the node and use docker to remove the image, and use helm to reinstall zeebe, I repeated this step many times, but the problem still exists.

➜  ~ kubectl describe pods zeebe-zeebe-0
Name:         zeebe-zeebe-0
Namespace:    default
Priority:     0
Node:         server4/192.168.137.124
Start Time:   Tue, 20 Oct 2020 16:38:54 +0800
Labels:       app.kubernetes.io/component=broker
              app.kubernetes.io/instance=zeebe
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=zeebe
              controller-revision-hash=zeebe-zeebe-555c6877cd
              statefulset.kubernetes.io/pod-name=zeebe-zeebe-0
Annotations:  <none>
Status:       Running
IP:           10.244.1.108
IPs:
  IP:           10.244.1.108
Controlled By:  StatefulSet/zeebe-zeebe
Containers:
  zeebe:
    Container ID:   docker://8f3a116d0fa70298a240a6c92d3213025b235cdfe772a7660e47291cff228659
    Image:          camunda/zeebe:0.24.2
    Image ID:       docker-pullable://camunda/zeebe@sha256:795ace31c498ad4bc37b7b0fab612307c34852f4187766e3f777a509821c9fb3
    Ports:          9600/TCP, 26501/TCP, 26502/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Tue, 20 Oct 2020 16:47:55 +0800
      Finished:     Tue, 20 Oct 2020 16:48:01 +0800
    Ready:          False
    Restart Count:  6
    Limits:
      cpu:     1
      memory:  4Gi
    Requests:
      cpu:      500m
      memory:   2Gi
    Readiness:  http-get http://:9600/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ZEEBE_BROKER_CLUSTER_CLUSTERNAME:                zeebe-zeebe
      ZEEBE_LOG_LEVEL:
      ZEEBE_BROKER_CLUSTER_PARTITIONSCOUNT:            3
      ZEEBE_BROKER_CLUSTER_CLUSTERSIZE:                3
      ZEEBE_BROKER_CLUSTER_REPLICATIONFACTOR:          3
      ZEEBE_BROKER_THREADS_CPUTHREADCOUNT:             2
      ZEEBE_BROKER_THREADS_IOTHREADCOUNT:              2
      ZEEBE_BROKER_GATEWAY_ENABLE:                     false
      ZEEBE_BROKER_EXPORTERS_ELASTICSEARCH_CLASSNAME:  io.zeebe.exporter.ElasticsearchExporter
      ZEEBE_BROKER_EXPORTERS_ELASTICSEARCH_ARGS_URL:   http://elasticsearch-master:9200
      ZEEBE_BROKER_NETWORK_COMMANDAPI_PORT:            26501
      ZEEBE_BROKER_NETWORK_INTERNALAPI_PORT:           26502
      ZEEBE_BROKER_NETWORK_MONITORINGAPI_PORT:         9600
      K8S_POD_NAME:                                    zeebe-zeebe-0 (v1:metadata.name)
      JAVA_TOOL_OPTIONS:                               -XX:MaxRAMPercentage=25.0 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/zeebe/data -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log -XX:+ExitOnOutOfMemoryError
    Mounts:
      /exporters from exporters (rw)
      /usr/local/bin/startup.sh from config (rw,path="startup.sh")
      /usr/local/zeebe/config/application.yaml from config (rw,path="application.yaml")
      /usr/local/zeebe/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-mt74h (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-zeebe-zeebe-0
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      zeebe
    Optional:  false
  exporters:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  default-token-mt74h:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-mt74h
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  13m                   default-scheduler  Successfully assigned default/zeebe-zeebe-0 to server4
  Normal   Pulled     10m (x4 over 12m)     kubelet            Container image "camunda/zeebe:0.24.2" already present on machine
  Normal   Created    10m (x4 over 12m)     kubelet            Created container zeebe
  Normal   Started    10m (x4 over 12m)     kubelet            Started container zeebe
  Warning  Unhealthy  10m (x5 over 12m)     kubelet            Readiness probe failed: Get "http://10.244.1.108:9600/ready": dial tcp 10.244.1.108:9600: connect: connection refused
  Warning  Unhealthy  10m (x3 over 12m)     kubelet            Readiness probe failed: HTTP probe failed with statuscode: 503
  Warning  BackOff    2m50s (x33 over 11m)  kubelet            Back-off restarting failed container

...and for the log

➜  ~ kubectl logs zeebe-zeebe-0
++ hostname -f
+ export ZEEBE_BROKER_NETWORK_ADVERTISEDHOST=zeebe-zeebe-0.zeebe-zeebe.default.svc.cluster.local
+ ZEEBE_BROKER_NETWORK_ADVERTISEDHOST=zeebe-zeebe-0.zeebe-zeebe.default.svc.cluster.local
+ export ZEEBE_BROKER_CLUSTER_NODEID=0
+ ZEEBE_BROKER_CLUSTER_NODEID=0
+ export ZEEBE_BROKER_CLUSTER_CLUSTERSIZE=3
+ ZEEBE_BROKER_CLUSTER_CLUSTERSIZE=3
+ contactPointPrefix=zeebe-zeebe
+ contactPoints=
+ [[ -z '' ]]
+ (( i=0 ))
+ (( i<3 ))
++ hostname -d
+ contactPoints=,zeebe-zeebe-0.zeebe-zeebe.default.svc.cluster.local:26502
+ (( i++ ))
+ (( i<3 ))
++ hostname -d
+ contactPoints=,zeebe-zeebe-0.zeebe-zeebe.default.svc.cluster.local:26502,zeebe-zeebe-1.zeebe-zeebe.default.svc.cluster.local:26502
+ (( i++ ))
+ (( i<3 ))
++ hostname -d
+ contactPoints=,zeebe-zeebe-0.zeebe-zeebe.default.svc.cluster.local:26502,zeebe-zeebe-1.zeebe-zeebe.default.svc.cluster.local:26502,zeebe-zeebe-2.zeebe-zeebe.default.svc.cluster.local:26502
+ (( i++ ))
+ (( i<3 ))
+ export ZEEBE_BROKER_CLUSTER_INITIALCONTACTPOINTS=,zeebe-zeebe-0.zeebe-zeebe.default.svc.cluster.local:26502,zeebe-zeebe-1.zeebe-zeebe.default.svc.cluster.local:26502,zeebe-zeebe-2.zeebe-zeebe.default.svc.cluster.local:26502
+ ZEEBE_BROKER_CLUSTER_INITIALCONTACTPOINTS=,zeebe-zeebe-0.zeebe-zeebe.default.svc.cluster.local:26502,zeebe-zeebe-1.zeebe-zeebe.default.svc.cluster.local:26502,zeebe-zeebe-2.zeebe-zeebe.default.svc.cluster.local:26502
++ ls -A /exporters/
+ '[' '' ']'
No exporters available.
+ echo 'No exporters available.'
+ exec /usr/local/zeebe/bin/broker
Picked up JAVA_TOOL_OPTIONS: -XX:MaxRAMPercentage=25.0 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/zeebe/data -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log -XX:+ExitOnOutOfMemoryError
2020-10-20 08:53:11,231 main WARN Error while converting string [] to type [class org.apache.logging.log4j.Level]. Using default value [null]. java.lang.IllegalArgumentException: Unknown level constant [].
        at org.apache.logging.log4j.Level.valueOf(Level.java:320)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:288)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:284)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters.convert(TypeConverters.java:419)
        at org.apache.logging.log4j.core.config.plugins.visitors.AbstractPluginVisitor.convert(AbstractPluginVisitor.java:149)
        at org.apache.logging.log4j.core.config.plugins.visitors.PluginAttributeVisitor.visit(PluginAttributeVisitor.java:45)
        at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.generateParameters(PluginBuilder.java:258)
        at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.build(PluginBuilder.java:135)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createPluginObject(AbstractConfiguration.java:1002)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:942)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:934)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.doConfigure(AbstractConfiguration.java:552)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.initialize(AbstractConfiguration.java:241)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.start(AbstractConfiguration.java:288)
        at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:618)
        at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:691)
        at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:708)
        at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:263)
        at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:153)
        at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:45)
        at org.apache.logging.log4j.LogManager.getContext(LogManager.java:194)
        at org.apache.commons.logging.LogAdapter$Log4jLog.<clinit>(LogAdapter.java:155)
        at org.apache.commons.logging.LogAdapter$Log4jAdapter.createLog(LogAdapter.java:122)
        at org.apache.commons.logging.LogAdapter.createLog(LogAdapter.java:89)
        at org.apache.commons.logging.LogFactoryService.getInstance(LogFactoryService.java:46)
        at org.apache.commons.logging.LogFactoryService.getInstance(LogFactoryService.java:41)
        at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:655)
        at org.springframework.boot.SpringApplication.<clinit>(SpringApplication.java:196)
        at io.zeebe.broker.StandaloneBroker.main(StandaloneBroker.java:52)

2020-10-20 08:53:12,913 main WARN Error while converting string [] to type [class org.apache.logging.log4j.Level]. Using default value [null]. java.lang.IllegalArgumentException: Unknown level constant [].
        at org.apache.logging.log4j.Level.valueOf(Level.java:320)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:288)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:284)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters.convert(TypeConverters.java:419)
        at org.apache.logging.log4j.core.config.plugins.visitors.AbstractPluginVisitor.convert(AbstractPluginVisitor.java:149)
        at org.apache.logging.log4j.core.config.plugins.visitors.PluginAttributeVisitor.visit(PluginAttributeVisitor.java:45)
        at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.generateParameters(PluginBuilder.java:258)
        at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.build(PluginBuilder.java:135)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createPluginObject(AbstractConfiguration.java:1002)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:942)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:934)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.doConfigure(AbstractConfiguration.java:552)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.initialize(AbstractConfiguration.java:241)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.start(AbstractConfiguration.java:288)
        at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:618)
        at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:691)
        at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:708)
        at org.springframework.boot.logging.log4j2.Log4J2LoggingSystem.reinitialize(Log4J2LoggingSystem.java:204)
        at org.springframework.boot.logging.AbstractLoggingSystem.initializeWithConventions(AbstractLoggingSystem.java:73)
        at org.springframework.boot.logging.AbstractLoggingSystem.initialize(AbstractLoggingSystem.java:60)
        at org.springframework.boot.logging.log4j2.Log4J2LoggingSystem.initialize(Log4J2LoggingSystem.java:160)
        at org.springframework.boot.context.logging.LoggingApplicationListener.initializeSystem(LoggingApplicationListener.java:306)
        at org.springframework.boot.context.logging.LoggingApplicationListener.initialize(LoggingApplicationListener.java:281)
        at org.springframework.boot.context.logging.LoggingApplicationListener.onApplicationEnvironmentPreparedEvent(LoggingApplicationListener.java:239)
        at org.springframework.boot.context.logging.LoggingApplicationListener.onApplicationEvent(LoggingApplicationListener.java:216)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.doInvokeListener(SimpleApplicationEventMulticaster.java:172)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:165)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:139)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:127)
        at org.springframework.boot.context.event.EventPublishingRunListener.environmentPrepared(EventPublishingRunListener.java:80)
        at org.springframework.boot.SpringApplicationRunListeners.environmentPrepared(SpringApplicationRunListeners.java:53)
        at org.springframework.boot.SpringApplication.prepareEnvironment(SpringApplication.java:345)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:308)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1237)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1226)
        at io.zeebe.broker.StandaloneBroker.main(StandaloneBroker.java:52)

  ______  ______   ______   ____    ______     ____    _____     ____    _  __  ______   _____
 |___  / |  ____| |  ____| |  _ \  |  ____|   |  _ \  |  __ \   / __ \  | |/ / |  ____| |  __ \
    / /  | |__    | |__    | |_) | | |__      | |_) | | |__) | | |  | | | ' /  | |__    | |__) |
   / /   |  __|   |  __|   |  _ <  |  __|     |  _ <  |  _  /  | |  | | |  <   |  __|   |  _  /
  / /__  | |____  | |____  | |_) | | |____    | |_) | | | \ \  | |__| | | . \  | |____  | | \ \
 /_____| |______| |______| |____/  |______|   |____/  |_|  \_\  \____/  |_|\_\ |______| |_|  \_\

2020-10-20 08:53:13.232 [] [main] INFO  io.zeebe.broker.StandaloneBroker - Starting StandaloneBroker v0.24.2 on zeebe-zeebe-0 with PID 6 (/usr/local/zeebe/lib/zeebe-distribution-0.24.2.jar started by root in /usr/local/zeebe)
2020-10-20 08:53:13.300 [] [main] INFO  io.zeebe.broker.StandaloneBroker - No active profile set, falling back to default profiles: default
2020-10-20 08:53:18.335 [] [main] INFO  org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat initialized with port(s): 9600 (http)
2020-10-20 08:53:18.405 [] [main] INFO  org.apache.coyote.http11.Http11NioProtocol - Initializing ProtocolHandler ["http-nio-0.0.0.0-9600"]
2020-10-20 08:53:18.406 [] [main] INFO  org.apache.catalina.core.StandardService - Starting service [Tomcat]
2020-10-20 08:53:18.407 [] [main] INFO  org.apache.catalina.core.StandardEngine - Starting Servlet engine: [Apache Tomcat/9.0.36]
2020-10-20 08:53:18.797 [] [main] INFO  org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring embedded WebApplicationContext
2020-10-20 08:53:18.798 [] [main] INFO  org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext - Root WebApplicationContext: initialization completed in 5368 ms
2020-10-20 08:53:20.219 [] [main] INFO  org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor - Initializing ExecutorService 'applicationTaskExecutor'
2020-10-20 08:53:21.126 [] [main] INFO  org.springframework.boot.actuate.endpoint.web.EndpointLinksResolver - Exposing 2 endpoint(s) beneath base path '/actuator'
2020-10-20 08:53:21.218 [] [main] INFO  org.apache.coyote.http11.Http11NioProtocol - Starting ProtocolHandler ["http-nio-0.0.0.0-9600"]
2020-10-20 08:53:21.394 [] [main] INFO  org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat started on port(s): 9600 (http) with context path ''
2020-10-20 08:53:21.415 [] [main] INFO  io.zeebe.broker.StandaloneBroker - Started StandaloneBroker in 9.509 seconds (JVM running for 13.396)
2020-10-20 08:53:21.536 [] [main] INFO  io.zeebe.broker.system - Version: 0.24.2
2020-10-20 08:53:21.706 [] [main] INFO  io.zeebe.broker.system - Starting broker 0 with configuration {
  "network" : {
    "host" : "0.0.0.0",
    "portOffset" : 0,
    "maxMessageSize" : "4MB",
    "advertisedHost" : "zeebe-zeebe-0.zeebe-zeebe.default.svc.cluster.local",
    "commandApi" : {
      "host" : "0.0.0.0",
      "port" : 26501,
      "advertisedHost" : "zeebe-zeebe-0.zeebe-zeebe.default.svc.cluster.local",
      "advertisedPort" : 26501,
      "advertisedAddress" : "zeebe-zeebe-0.zeebe-zeebe.default.svc.cluster.local:26501",
      "address" : "0.0.0.0:26501"
    },
    "internalApi" : {
      "host" : "0.0.0.0",
      "port" : 26502,
      "advertisedHost" : "zeebe-zeebe-0.zeebe-zeebe.default.svc.cluster.local",
      "advertisedPort" : 26502,
      "advertisedAddress" : "zeebe-zeebe-0.zeebe-zeebe.default.svc.cluster.local:26502",
      "address" : "0.0.0.0:26502"
    },
    "monitoringApi" : {
      "host" : "0.0.0.0",
      "port" : 9600,
      "advertisedHost" : "zeebe-zeebe-0.zeebe-zeebe.default.svc.cluster.local",
      "advertisedPort" : 9600,
      "advertisedAddress" : "zeebe-zeebe-0.zeebe-zeebe.default.svc.cluster.local:9600",
      "address" : "0.0.0.0:9600"
    },
    "maxMessageSizeInBytes" : 4194304
  },
  "cluster" : {
    "initialContactPoints" : [ "zeebe-zeebe-0.zeebe-zeebe.default.svc.cluster.local:26502", "zeebe-zeebe-1.zeebe-zeebe.default.svc.cluster.local:26502", "zeebe-zeebe-2.zeebe-zeebe.default.svc.cluster.local:26502" ],
    "partitionIds" : [ 1, 2, 3 ],
    "nodeId" : 0,
    "partitionsCount" : 3,
    "replicationFactor" : 3,
    "clusterSize" : 3,
    "clusterName" : "zeebe-zeebe",
    "membership" : {
      "broadcastUpdates" : false,
      "broadcastDisputes" : true,
      "notifySuspect" : false,
      "gossipInterval" : "PT0.25S",
      "gossipFanout" : 2,
      "probeInterval" : "PT1S",
      "probeTimeout" : "PT2S",
      "suspectProbes" : 3,
      "failureTimeout" : "PT10S",
      "syncInterval" : "PT10S"
    }
  },
  "threads" : {
    "cpuThreadCount" : 2,
    "ioThreadCount" : 2
  },
  "data" : {
    "directories" : [ "/usr/local/zeebe/data" ],
    "logSegmentSize" : "512MB",
    "snapshotPeriod" : "PT15M",
    "logIndexDensity" : 100,
    "logSegmentSizeInBytes" : 536870912,
    "atomixStorageLevel" : "DISK"
  },
  "exporters" : {
    "elasticsearch" : {
      "jarPath" : null,
      "className" : "io.zeebe.exporter.ElasticsearchExporter",
      "args" : {
        "url" : "http://elasticsearch-master:9200"
      },
      "external" : false
    }
  },
  "gateway" : {
    "network" : {
      "host" : "0.0.0.0",
      "port" : 26500,
      "minKeepAliveInterval" : "PT30S"
    },
    "cluster" : {
      "contactPoint" : "0.0.0.0:26502",
      "requestTimeout" : "PT15S",
      "clusterName" : "zeebe-cluster",
      "memberId" : "gateway",
      "host" : "0.0.0.0",
      "port" : 26502,
      "membership" : {
        "broadcastUpdates" : false,
        "broadcastDisputes" : true,
        "notifySuspect" : false,
        "gossipInterval" : "PT0.25S",
        "gossipFanout" : 2,
        "probeInterval" : "PT1S",
        "probeTimeout" : "PT2S",
        "suspectProbes" : 3,
        "failureTimeout" : "PT10S",
        "syncInterval" : "PT10S"
      }
    },
    "threads" : {
      "managementThreads" : 1
    },
    "monitoring" : {
      "enabled" : false,
      "host" : "0.0.0.0",
      "port" : 9600
    },
    "security" : {
      "enabled" : false,
      "certificateChainPath" : null,
      "privateKeyPath" : null
    },
    "longPolling" : {
      "enabled" : true
    },
    "initialized" : true,
    "enable" : false
  },
  "backpressure" : {
    "enabled" : true,
    "algorithm" : "VEGAS",
    "aimd" : {
      "requestTimeout" : "PT1S",
      "initialLimit" : 100,
      "minLimit" : 1,
      "maxLimit" : 1000,
      "backoffRatio" : 0.9
    },
    "fixedLimit" : {
      "limit" : 20
    },
    "vegas" : {
      "alpha" : 3,
      "beta" : 6,
      "initialLimit" : 20
    },
    "gradient" : {
      "minLimit" : 10,
      "initialLimit" : 20,
      "rttTolerance" : 2.0
    },
    "gradient2" : {
      "minLimit" : 10,
      "initialLimit" : 20,
      "rttTolerance" : 2.0,
      "longWindow" : 600
    }
  },
  "stepTimeout" : "PT5M",
  "executionMetricsExporterEnabled" : false
}
2020-10-20 08:53:21.733 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [1/10]: actor scheduler
2020-10-20 08:53:21.810 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [2/10]: membership and replication protocol
2020-10-20 08:53:23.806 [] [http-nio-0.0.0.0-9600-exec-1] INFO  org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring DispatcherServlet 'dispatcherServlet'
2020-10-20 08:53:23.814 [] [http-nio-0.0.0.0-9600-exec-1] INFO  org.springframework.web.servlet.DispatcherServlet - Initializing Servlet 'dispatcherServlet'
2020-10-20 08:53:24.003 [] [http-nio-0.0.0.0-9600-exec-1] INFO  org.springframework.web.servlet.DispatcherServlet - Completed initialization in 189 ms
2020-10-20 08:53:30.219 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [3/10]: command api transport
2020-10-20 08:53:30.725 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [4/10]: command api handler
2020-10-20 08:53:31.199 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [5/10]: subscription api
2020-10-20 08:53:31.239 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [6/10]: cluster services
2020-10-20 08:53:36.339 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [7/10]: topology manager
2020-10-20 08:53:36.341 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [8/10]: monitoring services
2020-10-20 08:53:36.349 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [9/10]: leader management request handler
2020-10-20 08:53:36.357 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [10/10]: zeebe partitions
2020-10-20 08:53:36.360 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 partitions [1/3]: partition 3
2020-10-20 08:53:36.796 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 partitions [2/3]: partition 2
2020-10-20 08:53:36.816 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 partitions [3/3]: partition 1
2020-10-20 08:53:36.903 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 partitions succeeded. Started 3 steps in 543 ms.
2020-10-20 08:53:36.904 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 succeeded. Started 10 steps in 15172 ms.

Version Info:

➜  ~ kubectl get nodes
NAME      STATUS   ROLES    AGE   VERSION
server1   Ready    master   23d   v1.19.2
server2   Ready    <none>   23d   v1.19.2
server3   Ready    <none>   23d   v1.19.2
server4   Ready    <none>   23d   v1.19.2
➜  ~ helm install zeebe zeebe/zeebe-full
NAME: zeebe
LAST DEPLOYED: Wed Oct 21 09:56:10 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
______     ______     ______     ______     ______
/\___  \   /\  ___\   /\  ___\   /\  == \   /\  ___\
\/_/  /__  \ \  __\   \ \  __\   \ \  __<   \ \  __\
  /\_____\  \ \_____\  \ \_____\  \ \_____\  \ \_____\
  \/_____/   \/_____/   \/_____/   \/_____/   \/_____/

(zeebe-full - 0.0.107)

- Cluster Name: zeebe-zeebe
salaboy commented 4 years ago

@BigeYoung thanks for reporting this. can you please try something very simple. Don't call the release zeebe call it something like my-zeebe. I have the feeling that there is an issue with naming it zeebe, we can check that when you confirm that is working. Also check that you are not out of resources in your cluster.

Can you please check those two things and get back to us?

BigeYoung commented 4 years ago

Thank you very much for your prompt reply. I changed the installation name to "myzeebe" according to your request, but unfortunately, the problem still exists. I installed zeebe with:

➜  ~ helm install myzeebe zeebe/zeebe-full
NAME: myzeebe
LAST DEPLOYED: Wed Oct 21 16:40:21 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
______     ______     ______     ______     ______
/\___  \   /\  ___\   /\  ___\   /\  == \   /\  ___\
\/_/  /__  \ \  __\   \ \  __\   \ \  __<   \ \  __\
  /\_____\  \ \_____\  \ \_____\  \ \_____\  \ \_____\
  \/_____/   \/_____/   \/_____/   \/_____/   \/_____/

(zeebe-full - 0.0.107)

- Cluster Name: myzeebe-zeebe

image

As shown in the figure below, my node resources are very sufficient. image

salaboy commented 4 years ago

@BigeYoung thanks for trying that out.. I will need to investigate.. Can you please go into the pod description to see why the pod is being killed? I am more interested in the zeebe broker than Operate..

Doing a kubectl describe pod ... should show you the events and why the pod was restarted..

BigeYoung commented 4 years ago

I think what I posted at the beginning was exactly the broker log. But since you asked, I am happy to post the log again.

kubectl describe pod myzeebe-zeebe-2

➜  ~ kubectl describe pod myzeebe-zeebe-2
Name:         myzeebe-zeebe-2
Namespace:    default
Priority:     0
Node:         server4/192.168.137.124
Start Time:   Wed, 21 Oct 2020 16:40:27 +0800
Labels:       app.kubernetes.io/component=broker
              app.kubernetes.io/instance=myzeebe
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=zeebe
              controller-revision-hash=myzeebe-zeebe-b8f64bbb8
              statefulset.kubernetes.io/pod-name=myzeebe-zeebe-2
Annotations:  <none>
Status:       Running
IP:           10.244.1.139
IPs:
  IP:           10.244.1.139
Controlled By:  StatefulSet/myzeebe-zeebe
Containers:
  zeebe:
    Container ID:   docker://140d8d7c2770d8942480ff25a2d099ad8933a562cb389822b019529df0ed8e45
    Image:          camunda/zeebe:0.24.2
    Image ID:       docker-pullable://camunda/zeebe@sha256:795ace31c498ad4bc37b7b0fab612307c34852f4187766e3f777a509821c9fb3
    Ports:          9600/TCP, 26501/TCP, 26502/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Thu, 22 Oct 2020 14:25:06 +0800
      Finished:     Thu, 22 Oct 2020 14:26:01 +0800
    Ready:          False
    Restart Count:  222
    Limits:
      cpu:     1
      memory:  4Gi
    Requests:
      cpu:      500m
      memory:   2Gi
    Readiness:  http-get http://:9600/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ZEEBE_BROKER_CLUSTER_CLUSTERNAME:                myzeebe-zeebe
      ZEEBE_LOG_LEVEL:
      ZEEBE_BROKER_CLUSTER_PARTITIONSCOUNT:            3
      ZEEBE_BROKER_CLUSTER_CLUSTERSIZE:                3
      ZEEBE_BROKER_CLUSTER_REPLICATIONFACTOR:          3
      ZEEBE_BROKER_THREADS_CPUTHREADCOUNT:             2
      ZEEBE_BROKER_THREADS_IOTHREADCOUNT:              2
      ZEEBE_BROKER_GATEWAY_ENABLE:                     false
      ZEEBE_BROKER_EXPORTERS_ELASTICSEARCH_CLASSNAME:  io.zeebe.exporter.ElasticsearchExporter
      ZEEBE_BROKER_EXPORTERS_ELASTICSEARCH_ARGS_URL:   http://elasticsearch-master:9200
      ZEEBE_BROKER_NETWORK_COMMANDAPI_PORT:            26501
      ZEEBE_BROKER_NETWORK_INTERNALAPI_PORT:           26502
      ZEEBE_BROKER_NETWORK_MONITORINGAPI_PORT:         9600
      K8S_POD_NAME:                                    myzeebe-zeebe-2 (v1:metadata.name)
      JAVA_TOOL_OPTIONS:                               -XX:MaxRAMPercentage=25.0 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/zeebe/data -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log -XX:+ExitOnOutOfMemoryError
    Mounts:
      /exporters from exporters (rw)
      /usr/local/bin/startup.sh from config (rw,path="startup.sh")
      /usr/local/zeebe/config/application.yaml from config (rw,path="application.yaml")
      /usr/local/zeebe/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-mt74h (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-myzeebe-zeebe-2
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      myzeebe
    Optional:  false
  exporters:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  default-token-mt74h:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-mt74h
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason   Age                     From     Message
  ----     ------   ----                    ----     -------
  Warning  BackOff  2m54s (x5111 over 21h)  kubelet  Back-off restarting failed container

kubectl logs myzeebe-zeebe-2

➜  ~ kubectl logs myzeebe-zeebe-2
++ hostname -f
+ export ZEEBE_BROKER_NETWORK_ADVERTISEDHOST=myzeebe-zeebe-2.myzeebe-zeebe.default.svc.cluster.local
+ ZEEBE_BROKER_NETWORK_ADVERTISEDHOST=myzeebe-zeebe-2.myzeebe-zeebe.default.svc.cluster.local
+ export ZEEBE_BROKER_CLUSTER_NODEID=2
+ ZEEBE_BROKER_CLUSTER_NODEID=2
+ export ZEEBE_BROKER_CLUSTER_CLUSTERSIZE=3
+ ZEEBE_BROKER_CLUSTER_CLUSTERSIZE=3
+ contactPointPrefix=myzeebe-zeebe
+ contactPoints=
+ [[ -z '' ]]
+ (( i=0 ))
+ (( i<3 ))
++ hostname -d
+ contactPoints=,myzeebe-zeebe-0.myzeebe-zeebe.default.svc.cluster.local:26502
+ (( i++ ))
+ (( i<3 ))
++ hostname -d
+ contactPoints=,myzeebe-zeebe-0.myzeebe-zeebe.default.svc.cluster.local:26502,myzeebe-zeebe-1.myzeebe-zeebe.default.svc.cluster.local:26502
+ (( i++ ))
+ (( i<3 ))
++ hostname -d
+ contactPoints=,myzeebe-zeebe-0.myzeebe-zeebe.default.svc.cluster.local:26502,myzeebe-zeebe-1.myzeebe-zeebe.default.svc.cluster.local:26502,myzeebe-zeebe-2.myzeebe-zeebe.default.svc.cluster.local:26502
+ (( i++ ))
+ (( i<3 ))
+ export ZEEBE_BROKER_CLUSTER_INITIALCONTACTPOINTS=,myzeebe-zeebe-0.myzeebe-zeebe.default.svc.cluster.local:26502,myzeebe-zeebe-1.myzeebe-zeebe.default.svc.cluster.local:26502,myzeebe-zeebe-2.myzeebe-zeebe.default.svc.cluster.local:26502
+ ZEEBE_BROKER_CLUSTER_INITIALCONTACTPOINTS=,myzeebe-zeebe-0.myzeebe-zeebe.default.svc.cluster.local:26502,myzeebe-zeebe-1.myzeebe-zeebe.default.svc.cluster.local:26502,myzeebe-zeebe-2.myzeebe-zeebe.default.svc.cluster.local:26502
++ ls -A /exporters/
No exporters available.
+ '[' '' ']'
+ echo 'No exporters available.'
+ exec /usr/local/zeebe/bin/broker
Picked up JAVA_TOOL_OPTIONS: -XX:MaxRAMPercentage=25.0 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/zeebe/data -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log -XX:+ExitOnOutOfMemoryError
2020-10-22 06:25:10,209 main WARN Error while converting string [] to type [class org.apache.logging.log4j.Level]. Using default value [null]. java.lang.IllegalArgumentException: Unknown level constant [].
        at org.apache.logging.log4j.Level.valueOf(Level.java:320)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:288)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:284)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters.convert(TypeConverters.java:419)
        at org.apache.logging.log4j.core.config.plugins.visitors.AbstractPluginVisitor.convert(AbstractPluginVisitor.java:149)
        at org.apache.logging.log4j.core.config.plugins.visitors.PluginAttributeVisitor.visit(PluginAttributeVisitor.java:45)
        at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.generateParameters(PluginBuilder.java:258)
        at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.build(PluginBuilder.java:135)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createPluginObject(AbstractConfiguration.java:1002)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:942)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:934)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.doConfigure(AbstractConfiguration.java:552)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.initialize(AbstractConfiguration.java:241)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.start(AbstractConfiguration.java:288)
        at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:618)
        at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:691)
        at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:708)
        at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:263)
        at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:153)
        at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:45)
        at org.apache.logging.log4j.LogManager.getContext(LogManager.java:194)
        at org.apache.commons.logging.LogAdapter$Log4jLog.<clinit>(LogAdapter.java:155)
        at org.apache.commons.logging.LogAdapter$Log4jAdapter.createLog(LogAdapter.java:122)
        at org.apache.commons.logging.LogAdapter.createLog(LogAdapter.java:89)
        at org.apache.commons.logging.LogFactoryService.getInstance(LogFactoryService.java:46)
        at org.apache.commons.logging.LogFactoryService.getInstance(LogFactoryService.java:41)
        at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:655)
        at org.springframework.boot.SpringApplication.<clinit>(SpringApplication.java:196)
        at io.zeebe.broker.StandaloneBroker.main(StandaloneBroker.java:52)

2020-10-22 06:25:12,225 main WARN Error while converting string [] to type [class org.apache.logging.log4j.Level]. Using default value [null]. java.lang.IllegalArgumentException: Unknown level constant [].
        at org.apache.logging.log4j.Level.valueOf(Level.java:320)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:288)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:284)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters.convert(TypeConverters.java:419)
        at org.apache.logging.log4j.core.config.plugins.visitors.AbstractPluginVisitor.convert(AbstractPluginVisitor.java:149)
        at org.apache.logging.log4j.core.config.plugins.visitors.PluginAttributeVisitor.visit(PluginAttributeVisitor.java:45)
        at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.generateParameters(PluginBuilder.java:258)
        at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.build(PluginBuilder.java:135)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createPluginObject(AbstractConfiguration.java:1002)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:942)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:934)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.doConfigure(AbstractConfiguration.java:552)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.initialize(AbstractConfiguration.java:241)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.start(AbstractConfiguration.java:288)
        at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:618)
        at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:691)
        at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:708)
        at org.springframework.boot.logging.log4j2.Log4J2LoggingSystem.reinitialize(Log4J2LoggingSystem.java:204)
        at org.springframework.boot.logging.AbstractLoggingSystem.initializeWithConventions(AbstractLoggingSystem.java:73)
        at org.springframework.boot.logging.AbstractLoggingSystem.initialize(AbstractLoggingSystem.java:60)
        at org.springframework.boot.logging.log4j2.Log4J2LoggingSystem.initialize(Log4J2LoggingSystem.java:160)
        at org.springframework.boot.context.logging.LoggingApplicationListener.initializeSystem(LoggingApplicationListener.java:306)
        at org.springframework.boot.context.logging.LoggingApplicationListener.initialize(LoggingApplicationListener.java:281)
        at org.springframework.boot.context.logging.LoggingApplicationListener.onApplicationEnvironmentPreparedEvent(LoggingApplicationListener.java:239)
        at org.springframework.boot.context.logging.LoggingApplicationListener.onApplicationEvent(LoggingApplicationListener.java:216)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.doInvokeListener(SimpleApplicationEventMulticaster.java:172)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:165)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:139)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:127)
        at org.springframework.boot.context.event.EventPublishingRunListener.environmentPrepared(EventPublishingRunListener.java:80)
        at org.springframework.boot.SpringApplicationRunListeners.environmentPrepared(SpringApplicationRunListeners.java:53)
        at org.springframework.boot.SpringApplication.prepareEnvironment(SpringApplication.java:345)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:308)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1237)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1226)
        at io.zeebe.broker.StandaloneBroker.main(StandaloneBroker.java:52)

  ______  ______   ______   ____    ______     ____    _____     ____    _  __  ______   _____
 |___  / |  ____| |  ____| |  _ \  |  ____|   |  _ \  |  __ \   / __ \  | |/ / |  ____| |  __ \
    / /  | |__    | |__    | |_) | | |__      | |_) | | |__) | | |  | | | ' /  | |__    | |__) |
   / /   |  __|   |  __|   |  _ <  |  __|     |  _ <  |  _  /  | |  | | |  <   |  __|   |  _  /
  / /__  | |____  | |____  | |_) | | |____    | |_) | | | \ \  | |__| | | . \  | |____  | | \ \
 /_____| |______| |______| |____/  |______|   |____/  |_|  \_\  \____/  |_|\_\ |______| |_|  \_\

2020-10-22 06:25:12.507 [] [main] INFO  io.zeebe.broker.StandaloneBroker - Starting StandaloneBroker v0.24.2 on myzeebe-zeebe-2 with PID 6 (/usr/local/zeebe/lib/zeebe-distribution-0.24.2.jar started by root in /usr/local/zeebe)
2020-10-22 06:25:12.517 [] [main] INFO  io.zeebe.broker.StandaloneBroker - No active profile set, falling back to default profiles: default
2020-10-22 06:25:17.616 [] [main] INFO  org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat initialized with port(s): 9600 (http)
2020-10-22 06:25:17.702 [] [main] INFO  org.apache.coyote.http11.Http11NioProtocol - Initializing ProtocolHandler ["http-nio-0.0.0.0-9600"]
2020-10-22 06:25:17.703 [] [main] INFO  org.apache.catalina.core.StandardService - Starting service [Tomcat]
2020-10-22 06:25:17.704 [] [main] INFO  org.apache.catalina.core.StandardEngine - Starting Servlet engine: [Apache Tomcat/9.0.36]
2020-10-22 06:25:18.029 [] [main] INFO  org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring embedded WebApplicationContext
2020-10-22 06:25:18.029 [] [main] INFO  org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext - Root WebApplicationContext: initialization completed in 5330 ms
2020-10-22 06:25:19.596 [] [main] INFO  org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor - Initializing ExecutorService 'applicationTaskExecutor'
2020-10-22 06:25:20.433 [] [main] INFO  org.springframework.boot.actuate.endpoint.web.EndpointLinksResolver - Exposing 2 endpoint(s) beneath base path '/actuator'
2020-10-22 06:25:20.525 [] [main] INFO  org.apache.coyote.http11.Http11NioProtocol - Starting ProtocolHandler ["http-nio-0.0.0.0-9600"]
2020-10-22 06:25:20.703 [] [main] INFO  org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat started on port(s): 9600 (http) with context path ''
2020-10-22 06:25:20.736 [] [main] INFO  io.zeebe.broker.StandaloneBroker - Started StandaloneBroker in 9.739 seconds (JVM running for 13.814)
2020-10-22 06:25:20.903 [] [main] INFO  io.zeebe.broker.system - Version: 0.24.2
2020-10-22 06:25:21.023 [] [main] INFO  io.zeebe.broker.system - Starting broker 2 with configuration {
  "network" : {
    "host" : "0.0.0.0",
    "portOffset" : 0,
    "maxMessageSize" : "4MB",
    "advertisedHost" : "myzeebe-zeebe-2.myzeebe-zeebe.default.svc.cluster.local",
    "commandApi" : {
      "host" : "0.0.0.0",
      "port" : 26501,
      "advertisedHost" : "myzeebe-zeebe-2.myzeebe-zeebe.default.svc.cluster.local",
      "advertisedPort" : 26501,
      "advertisedAddress" : "myzeebe-zeebe-2.myzeebe-zeebe.default.svc.cluster.local:26501",
      "address" : "0.0.0.0:26501"
    },
    "internalApi" : {
      "host" : "0.0.0.0",
      "port" : 26502,
      "advertisedHost" : "myzeebe-zeebe-2.myzeebe-zeebe.default.svc.cluster.local",
      "advertisedPort" : 26502,
      "advertisedAddress" : "myzeebe-zeebe-2.myzeebe-zeebe.default.svc.cluster.local:26502",
      "address" : "0.0.0.0:26502"
    },
    "monitoringApi" : {
      "host" : "0.0.0.0",
      "port" : 9600,
      "advertisedHost" : "myzeebe-zeebe-2.myzeebe-zeebe.default.svc.cluster.local",
      "advertisedPort" : 9600,
      "advertisedAddress" : "myzeebe-zeebe-2.myzeebe-zeebe.default.svc.cluster.local:9600",
      "address" : "0.0.0.0:9600"
    },
    "maxMessageSizeInBytes" : 4194304
  },
  "cluster" : {
    "initialContactPoints" : [ "myzeebe-zeebe-0.myzeebe-zeebe.default.svc.cluster.local:26502", "myzeebe-zeebe-1.myzeebe-zeebe.default.svc.cluster.local:26502", "myzeebe-zeebe-2.myzeebe-zeebe.default.svc.cluster.local:26502" ],
    "partitionIds" : [ 1, 2, 3 ],
    "nodeId" : 2,
    "partitionsCount" : 3,
    "replicationFactor" : 3,
    "clusterSize" : 3,
    "clusterName" : "myzeebe-zeebe",
    "membership" : {
      "broadcastUpdates" : false,
      "broadcastDisputes" : true,
      "notifySuspect" : false,
      "gossipInterval" : "PT0.25S",
      "gossipFanout" : 2,
      "probeInterval" : "PT1S",
      "probeTimeout" : "PT2S",
      "suspectProbes" : 3,
      "failureTimeout" : "PT10S",
      "syncInterval" : "PT10S"
    }
  },
  "threads" : {
    "cpuThreadCount" : 2,
    "ioThreadCount" : 2
  },
  "data" : {
    "directories" : [ "/usr/local/zeebe/data" ],
    "logSegmentSize" : "512MB",
    "snapshotPeriod" : "PT15M",
    "logIndexDensity" : 100,
    "logSegmentSizeInBytes" : 536870912,
    "atomixStorageLevel" : "DISK"
  },
  "exporters" : {
    "elasticsearch" : {
      "jarPath" : null,
      "className" : "io.zeebe.exporter.ElasticsearchExporter",
      "args" : {
        "url" : "http://elasticsearch-master:9200"
      },
      "external" : false
    }
  },
  "gateway" : {
    "network" : {
      "host" : "0.0.0.0",
      "port" : 26500,
      "minKeepAliveInterval" : "PT30S"
    },
    "cluster" : {
      "contactPoint" : "0.0.0.0:26502",
      "requestTimeout" : "PT15S",
      "clusterName" : "zeebe-cluster",
      "memberId" : "gateway",
      "host" : "0.0.0.0",
      "port" : 26502,
      "membership" : {
        "broadcastUpdates" : false,
        "broadcastDisputes" : true,
        "notifySuspect" : false,
        "gossipInterval" : "PT0.25S",
        "gossipFanout" : 2,
        "probeInterval" : "PT1S",
        "probeTimeout" : "PT2S",
        "suspectProbes" : 3,
        "failureTimeout" : "PT10S",
        "syncInterval" : "PT10S"
      }
    },
    "threads" : {
      "managementThreads" : 1
    },
    "monitoring" : {
      "enabled" : false,
      "host" : "0.0.0.0",
      "port" : 9600
    },
    "security" : {
      "enabled" : false,
      "certificateChainPath" : null,
      "privateKeyPath" : null
    },
    "longPolling" : {
      "enabled" : true
    },
    "initialized" : true,
    "enable" : false
  },
  "backpressure" : {
    "enabled" : true,
    "algorithm" : "VEGAS",
    "aimd" : {
      "requestTimeout" : "PT1S",
      "initialLimit" : 100,
      "minLimit" : 1,
      "maxLimit" : 1000,
      "backoffRatio" : 0.9
    },
    "fixedLimit" : {
      "limit" : 20
    },
    "vegas" : {
      "alpha" : 3,
      "beta" : 6,
      "initialLimit" : 20
    },
    "gradient" : {
      "minLimit" : 10,
      "initialLimit" : 20,
      "rttTolerance" : 2.0
    },
    "gradient2" : {
      "minLimit" : 10,
      "initialLimit" : 20,
      "rttTolerance" : 2.0,
      "longWindow" : 600
    }
  },
  "stepTimeout" : "PT5M",
  "executionMetricsExporterEnabled" : false
}
2020-10-22 06:25:21.106 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-2 [1/10]: actor scheduler
2020-10-22 06:25:21.118 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-2 [2/10]: membership and replication protocol
2020-10-22 06:25:27.800 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-2 [3/10]: command api transport
2020-10-22 06:25:28.122 [] [http-nio-0.0.0.0-9600-exec-1] INFO  org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring DispatcherServlet 'dispatcherServlet'
2020-10-22 06:25:28.123 [] [http-nio-0.0.0.0-9600-exec-1] INFO  org.springframework.web.servlet.DispatcherServlet - Initializing Servlet 'dispatcherServlet'
2020-10-22 06:25:28.195 [] [http-nio-0.0.0.0-9600-exec-1] INFO  org.springframework.web.servlet.DispatcherServlet - Completed initialization in 72 ms
2020-10-22 06:25:28.523 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-2 [4/10]: command api handler
2020-10-22 06:25:29.107 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-2 [5/10]: subscription api
2020-10-22 06:25:29.138 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-2 [6/10]: cluster services
2020-10-22 06:25:31.141 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-2 [7/10]: topology manager
2020-10-22 06:25:31.146 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-2 [8/10]: monitoring services
2020-10-22 06:25:31.155 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-2 [9/10]: leader management request handler
2020-10-22 06:25:31.160 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-2 [10/10]: zeebe partitions
2020-10-22 06:25:31.203 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-2 partitions [1/3]: partition 3
2020-10-22 06:25:31.703 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-2 partitions [2/3]: partition 2
2020-10-22 06:25:31.726 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-2 partitions [3/3]: partition 1
2020-10-22 06:25:31.797 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-2 partitions succeeded. Started 3 steps in 594 ms.
2020-10-22 06:25:31.797 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-2 succeeded. Started 10 steps in 10692 ms.
salaboy commented 4 years ago

@BigeYoung thanks for that.. the describe command is saying that the readiness probe is failing

Readiness:  http-get http://:9600/ready delay=0s timeout=1s period=10s #success=1 #failure=3

and I think that it is due to lack of memory:

    Last State:     Terminated
      Reason:       Error
      Exit Code:    137

https://sysdig.com/blog/troubleshoot-kubernetes-oom/

Can you try with this?

helm install test-core zeebe/zeebe-full --values https://raw.githubusercontent.com/zeebe-io/zeebe-helm-profiles/master/zeebe-core-team.yaml
BigeYoung commented 4 years ago

Thank you again for your patience. I have followed your request, but the situation has not changed.

kubectl describe pod test-core-zeebe-1

➜  ~ kubectl describe pod test-core-zeebe-1
Name:         test-core-zeebe-1
Namespace:    default
Priority:     0
Node:         server4/192.168.137.124
Start Time:   Thu, 22 Oct 2020 17:00:56 +0800
Labels:       app.kubernetes.io/component=broker
              app.kubernetes.io/instance=test-core
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=zeebe
              controller-revision-hash=test-core-zeebe-75d69d8d5
              statefulset.kubernetes.io/pod-name=test-core-zeebe-1
Annotations:  <none>
Status:       Running
IP:           10.244.1.156
IPs:
  IP:           10.244.1.156
Controlled By:  StatefulSet/test-core-zeebe
Containers:
  zeebe:
    Container ID:   docker://bd2262c3ebba9d0e3dcee3e70360c7c2be39ae4b002be7f1f0264d3025754822
    Image:          camunda/zeebe:0.24.2
    Image ID:       docker-pullable://camunda/zeebe@sha256:795ace31c498ad4bc37b7b0fab612307c34852f4187766e3f777a509821c9fb3
    Ports:          9600/TCP, 26501/TCP, 26502/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Thu, 22 Oct 2020 17:04:53 +0800
      Finished:     Thu, 22 Oct 2020 17:05:02 +0800
    Ready:          False
    Restart Count:  4
    Limits:
      cpu:     1
      memory:  4Gi
    Requests:
      cpu:      500m
      memory:   2Gi
    Readiness:  http-get http://:9600/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ZEEBE_BROKER_CLUSTER_CLUSTERNAME:                test-core-zeebe
      ZEEBE_LOG_LEVEL:
      ZEEBE_BROKER_CLUSTER_PARTITIONSCOUNT:            3
      ZEEBE_BROKER_CLUSTER_CLUSTERSIZE:                3
      ZEEBE_BROKER_CLUSTER_REPLICATIONFACTOR:          3
      ZEEBE_BROKER_THREADS_CPUTHREADCOUNT:             2
      ZEEBE_BROKER_THREADS_IOTHREADCOUNT:              2
      ZEEBE_BROKER_GATEWAY_ENABLE:                     false
      ZEEBE_BROKER_EXPORTERS_ELASTICSEARCH_CLASSNAME:  io.zeebe.exporter.ElasticsearchExporter
      ZEEBE_BROKER_EXPORTERS_ELASTICSEARCH_ARGS_URL:   http://elasticsearch-master:9200
      ZEEBE_BROKER_NETWORK_COMMANDAPI_PORT:            26501
      ZEEBE_BROKER_NETWORK_INTERNALAPI_PORT:           26502
      ZEEBE_BROKER_NETWORK_MONITORINGAPI_PORT:         9600
      K8S_POD_NAME:                                    test-core-zeebe-1 (v1:metadata.name)
      JAVA_TOOL_OPTIONS:                               -XX:MaxRAMPercentage=25.0 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/zeebe/data -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log -XX:+ExitOnOutOfMemoryError
    Mounts:
      /exporters from exporters (rw)
      /usr/local/bin/startup.sh from config (rw,path="startup.sh")
      /usr/local/zeebe/config/application.yaml from config (rw,path="application.yaml")
      /usr/local/zeebe/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-mt74h (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-test-core-zeebe-1
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      test-core-zeebe
    Optional:  false
  exporters:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  default-token-mt74h:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-mt74h
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  5m16s (x2 over 5m17s)  default-scheduler  0/4 nodes are available: 4 pod has unbound immediate PersistentVolumeClaims.
  Normal   Scheduled         5m14s                  default-scheduler  Successfully assigned default/test-core-zeebe-1 to server4
  Normal   Pulled            2m45s (x4 over 5m12s)  kubelet            Container image "camunda/zeebe:0.24.2" already present on machine
  Normal   Created           2m45s (x4 over 5m12s)  kubelet            Created container zeebe
  Normal   Started           2m45s (x4 over 5m11s)  kubelet            Started container zeebe
  Warning  Unhealthy         2m40s (x5 over 5m10s)  kubelet            Readiness probe failed: Get "http://10.244.1.156:9600/ready": dial tcp 10.244.1.156:9600: connect: connection refused
  Warning  Unhealthy         2m29s (x2 over 3m29s)  kubelet            Readiness probe failed: Get "http://10.244.1.156:9600/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy         2m20s (x3 over 4m49s)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 503
  Warning  BackOff           1s (x13 over 4m7s)     kubelet            Back-off restarting failed container

kubectl logs test-core-zeebe-1

➜  ~ kubectl logs test-core-zeebe-1
++ hostname -f
+ export ZEEBE_BROKER_NETWORK_ADVERTISEDHOST=test-core-zeebe-1.test-core-zeebe.default.svc.cluster.local
+ ZEEBE_BROKER_NETWORK_ADVERTISEDHOST=test-core-zeebe-1.test-core-zeebe.default.svc.cluster.local
+ export ZEEBE_BROKER_CLUSTER_NODEID=1
+ ZEEBE_BROKER_CLUSTER_NODEID=1
+ export ZEEBE_BROKER_CLUSTER_CLUSTERSIZE=3
+ ZEEBE_BROKER_CLUSTER_CLUSTERSIZE=3
+ contactPointPrefix=test-core-zeebe
+ contactPoints=
+ [[ -z '' ]]
+ (( i=0 ))
+ (( i<3 ))
++ hostname -d
+ contactPoints=,test-core-zeebe-0.test-core-zeebe.default.svc.cluster.local:26502
+ (( i++ ))
+ (( i<3 ))
++ hostname -d
+ contactPoints=,test-core-zeebe-0.test-core-zeebe.default.svc.cluster.local:26502,test-core-zeebe-1.test-core-zeebe.default.svc.cluster.local:26502
+ (( i++ ))
+ (( i<3 ))
++ hostname -d
+ contactPoints=,test-core-zeebe-0.test-core-zeebe.default.svc.cluster.local:26502,test-core-zeebe-1.test-core-zeebe.default.svc.cluster.local:26502,test-core-zeebe-2.test-core-zeebe.default.svc.cluster.local:26502
+ (( i++ ))
+ (( i<3 ))
+ export ZEEBE_BROKER_CLUSTER_INITIALCONTACTPOINTS=,test-core-zeebe-0.test-core-zeebe.default.svc.cluster.local:26502,test-core-zeebe-1.test-core-zeebe.default.svc.cluster.local:26502,test-core-zeebe-2.test-core-zeebe.default.svc.cluster.local:26502
+ ZEEBE_BROKER_CLUSTER_INITIALCONTACTPOINTS=,test-core-zeebe-0.test-core-zeebe.default.svc.cluster.local:26502,test-core-zeebe-1.test-core-zeebe.default.svc.cluster.local:26502,test-core-zeebe-2.test-core-zeebe.default.svc.cluster.local:26502
++ ls -A /exporters/
+ '[' '' ']'
+ echo 'No exporters available.'
+ exec /usr/local/zeebe/bin/broker
No exporters available.
Picked up JAVA_TOOL_OPTIONS: -XX:MaxRAMPercentage=25.0 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/zeebe/data -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log -XX:+ExitOnOutOfMemoryError
2020-10-22 09:06:27,029 main WARN Error while converting string [] to type [class org.apache.logging.log4j.Level]. Using default value [null]. java.lang.IllegalArgumentException: Unknown level constant [].
        at org.apache.logging.log4j.Level.valueOf(Level.java:320)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:288)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:284)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters.convert(TypeConverters.java:419)
        at org.apache.logging.log4j.core.config.plugins.visitors.AbstractPluginVisitor.convert(AbstractPluginVisitor.java:149)
        at org.apache.logging.log4j.core.config.plugins.visitors.PluginAttributeVisitor.visit(PluginAttributeVisitor.java:45)
        at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.generateParameters(PluginBuilder.java:258)
        at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.build(PluginBuilder.java:135)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createPluginObject(AbstractConfiguration.java:1002)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:942)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:934)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.doConfigure(AbstractConfiguration.java:552)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.initialize(AbstractConfiguration.java:241)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.start(AbstractConfiguration.java:288)
        at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:618)
        at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:691)
        at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:708)
        at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:263)
        at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:153)
        at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:45)
        at org.apache.logging.log4j.LogManager.getContext(LogManager.java:194)
        at org.apache.commons.logging.LogAdapter$Log4jLog.<clinit>(LogAdapter.java:155)
        at org.apache.commons.logging.LogAdapter$Log4jAdapter.createLog(LogAdapter.java:122)
        at org.apache.commons.logging.LogAdapter.createLog(LogAdapter.java:89)
        at org.apache.commons.logging.LogFactoryService.getInstance(LogFactoryService.java:46)
        at org.apache.commons.logging.LogFactoryService.getInstance(LogFactoryService.java:41)
        at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:655)
        at org.springframework.boot.SpringApplication.<clinit>(SpringApplication.java:196)
        at io.zeebe.broker.StandaloneBroker.main(StandaloneBroker.java:52)

2020-10-22 09:06:28,716 main WARN Error while converting string [] to type [class org.apache.logging.log4j.Level]. Using default value [null]. java.lang.IllegalArgumentException: Unknown level constant [].
        at org.apache.logging.log4j.Level.valueOf(Level.java:320)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:288)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:284)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters.convert(TypeConverters.java:419)
        at org.apache.logging.log4j.core.config.plugins.visitors.AbstractPluginVisitor.convert(AbstractPluginVisitor.java:149)
        at org.apache.logging.log4j.core.config.plugins.visitors.PluginAttributeVisitor.visit(PluginAttributeVisitor.java:45)
        at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.generateParameters(PluginBuilder.java:258)
        at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.build(PluginBuilder.java:135)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createPluginObject(AbstractConfiguration.java:1002)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:942)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:934)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.doConfigure(AbstractConfiguration.java:552)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.initialize(AbstractConfiguration.java:241)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.start(AbstractConfiguration.java:288)
        at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:618)
        at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:691)
        at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:708)
        at org.springframework.boot.logging.log4j2.Log4J2LoggingSystem.reinitialize(Log4J2LoggingSystem.java:204)
        at org.springframework.boot.logging.AbstractLoggingSystem.initializeWithConventions(AbstractLoggingSystem.java:73)
        at org.springframework.boot.logging.AbstractLoggingSystem.initialize(AbstractLoggingSystem.java:60)
        at org.springframework.boot.logging.log4j2.Log4J2LoggingSystem.initialize(Log4J2LoggingSystem.java:160)
        at org.springframework.boot.context.logging.LoggingApplicationListener.initializeSystem(LoggingApplicationListener.java:306)
        at org.springframework.boot.context.logging.LoggingApplicationListener.initialize(LoggingApplicationListener.java:281)
        at org.springframework.boot.context.logging.LoggingApplicationListener.onApplicationEnvironmentPreparedEvent(LoggingApplicationListener.java:239)
        at org.springframework.boot.context.logging.LoggingApplicationListener.onApplicationEvent(LoggingApplicationListener.java:216)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.doInvokeListener(SimpleApplicationEventMulticaster.java:172)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:165)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:139)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:127)
        at org.springframework.boot.context.event.EventPublishingRunListener.environmentPrepared(EventPublishingRunListener.java:80)
        at org.springframework.boot.SpringApplicationRunListeners.environmentPrepared(SpringApplicationRunListeners.java:53)
        at org.springframework.boot.SpringApplication.prepareEnvironment(SpringApplication.java:345)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:308)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1237)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1226)
        at io.zeebe.broker.StandaloneBroker.main(StandaloneBroker.java:52)

  ______  ______   ______   ____    ______     ____    _____     ____    _  __  ______   _____
 |___  / |  ____| |  ____| |  _ \  |  ____|   |  _ \  |  __ \   / __ \  | |/ / |  ____| |  __ \
    / /  | |__    | |__    | |_) | | |__      | |_) | | |__) | | |  | | | ' /  | |__    | |__) |
   / /   |  __|   |  __|   |  _ <  |  __|     |  _ <  |  _  /  | |  | | |  <   |  __|   |  _  /
  / /__  | |____  | |____  | |_) | | |____    | |_) | | | \ \  | |__| | | . \  | |____  | | \ \
 /_____| |______| |______| |____/  |______|   |____/  |_|  \_\  \____/  |_|\_\ |______| |_|  \_\

2020-10-22 09:06:28.996 [] [main] INFO  io.zeebe.broker.StandaloneBroker - Starting StandaloneBroker v0.24.2 on test-core-zeebe-1 with PID 6 (/usr/local/zeebe/lib/zeebe-distribution-0.24.2.jar started by root in /usr/local/zeebe)
2020-10-22 09:06:29.008 [] [main] INFO  io.zeebe.broker.StandaloneBroker - No active profile set, falling back to default profiles: default
2020-10-22 09:06:33.700 [] [main] INFO  org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat initialized with port(s): 9600 (http)
2020-10-22 09:06:33.733 [] [main] INFO  org.apache.coyote.http11.Http11NioProtocol - Initializing ProtocolHandler ["http-nio-0.0.0.0-9600"]
2020-10-22 09:06:33.735 [] [main] INFO  org.apache.catalina.core.StandardService - Starting service [Tomcat]
2020-10-22 09:06:33.737 [] [main] INFO  org.apache.catalina.core.StandardEngine - Starting Servlet engine: [Apache Tomcat/9.0.36]
2020-10-22 09:06:34.100 [] [main] INFO  org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring embedded WebApplicationContext
2020-10-22 09:06:34.100 [] [main] INFO  org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext - Root WebApplicationContext: initialization completed in 4903 ms
2020-10-22 09:06:35.526 [] [main] INFO  org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor - Initializing ExecutorService 'applicationTaskExecutor'
2020-10-22 09:06:36.531 [] [main] INFO  org.springframework.boot.actuate.endpoint.web.EndpointLinksResolver - Exposing 2 endpoint(s) beneath base path '/actuator'
2020-10-22 09:06:36.623 [] [main] INFO  org.apache.coyote.http11.Http11NioProtocol - Starting ProtocolHandler ["http-nio-0.0.0.0-9600"]
2020-10-22 09:06:36.802 [] [main] INFO  org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat started on port(s): 9600 (http) with context path ''
2020-10-22 09:06:36.831 [] [main] INFO  io.zeebe.broker.StandaloneBroker - Started StandaloneBroker in 9.131 seconds (JVM running for 12.893)
2020-10-22 09:06:37.014 [] [main] INFO  io.zeebe.broker.system - Version: 0.24.2
2020-10-22 09:06:37.211 [] [main] INFO  io.zeebe.broker.system - Starting broker 1 with configuration {
  "network" : {
    "host" : "0.0.0.0",
    "portOffset" : 0,
    "maxMessageSize" : "4MB",
    "advertisedHost" : "test-core-zeebe-1.test-core-zeebe.default.svc.cluster.local",
    "commandApi" : {
      "host" : "0.0.0.0",
      "port" : 26501,
      "advertisedHost" : "test-core-zeebe-1.test-core-zeebe.default.svc.cluster.local",
      "advertisedPort" : 26501,
      "advertisedAddress" : "test-core-zeebe-1.test-core-zeebe.default.svc.cluster.local:26501",
      "address" : "0.0.0.0:26501"
    },
    "internalApi" : {
      "host" : "0.0.0.0",
      "port" : 26502,
      "advertisedHost" : "test-core-zeebe-1.test-core-zeebe.default.svc.cluster.local",
      "advertisedPort" : 26502,
      "advertisedAddress" : "test-core-zeebe-1.test-core-zeebe.default.svc.cluster.local:26502",
      "address" : "0.0.0.0:26502"
    },
    "monitoringApi" : {
      "host" : "0.0.0.0",
      "port" : 9600,
      "advertisedHost" : "test-core-zeebe-1.test-core-zeebe.default.svc.cluster.local",
      "advertisedPort" : 9600,
      "advertisedAddress" : "test-core-zeebe-1.test-core-zeebe.default.svc.cluster.local:9600",
      "address" : "0.0.0.0:9600"
    },
    "maxMessageSizeInBytes" : 4194304
  },
  "cluster" : {
    "initialContactPoints" : [ "test-core-zeebe-0.test-core-zeebe.default.svc.cluster.local:26502", "test-core-zeebe-1.test-core-zeebe.default.svc.cluster.local:26502", "test-core-zeebe-2.test-core-zeebe.default.svc.cluster.local:26502" ],
    "partitionIds" : [ 1, 2, 3 ],
    "nodeId" : 1,
    "partitionsCount" : 3,
    "replicationFactor" : 3,
    "clusterSize" : 3,
    "clusterName" : "test-core-zeebe",
    "membership" : {
      "broadcastUpdates" : false,
      "broadcastDisputes" : true,
      "notifySuspect" : false,
      "gossipInterval" : "PT0.25S",
      "gossipFanout" : 2,
      "probeInterval" : "PT1S",
      "probeTimeout" : "PT2S",
      "suspectProbes" : 3,
      "failureTimeout" : "PT10S",
      "syncInterval" : "PT10S"
    }
  },
  "threads" : {
    "cpuThreadCount" : 2,
    "ioThreadCount" : 2
  },
  "data" : {
    "directories" : [ "/usr/local/zeebe/data" ],
    "logSegmentSize" : "512MB",
    "snapshotPeriod" : "PT15M",
    "logIndexDensity" : 100,
    "logSegmentSizeInBytes" : 536870912,
    "atomixStorageLevel" : "DISK"
  },
  "exporters" : {
    "elasticsearch" : {
      "jarPath" : null,
      "className" : "io.zeebe.exporter.ElasticsearchExporter",
      "args" : {
        "url" : "http://elasticsearch-master:9200"
      },
      "external" : false
    }
  },
  "gateway" : {
    "network" : {
      "host" : "0.0.0.0",
      "port" : 26500,
      "minKeepAliveInterval" : "PT30S"
    },
    "cluster" : {
      "contactPoint" : "0.0.0.0:26502",
      "requestTimeout" : "PT15S",
      "clusterName" : "zeebe-cluster",
      "memberId" : "gateway",
      "host" : "0.0.0.0",
      "port" : 26502,
      "membership" : {
        "broadcastUpdates" : false,
        "broadcastDisputes" : true,
        "notifySuspect" : false,
        "gossipInterval" : "PT0.25S",
        "gossipFanout" : 2,
        "probeInterval" : "PT1S",
        "probeTimeout" : "PT2S",
        "suspectProbes" : 3,
        "failureTimeout" : "PT10S",
        "syncInterval" : "PT10S"
      }
    },
    "threads" : {
      "managementThreads" : 1
    },
    "monitoring" : {
      "enabled" : false,
      "host" : "0.0.0.0",
      "port" : 9600
    },
    "security" : {
      "enabled" : false,
      "certificateChainPath" : null,
      "privateKeyPath" : null
    },
    "longPolling" : {
      "enabled" : true
    },
    "initialized" : true,
    "enable" : false
  },
  "backpressure" : {
    "enabled" : true,
    "algorithm" : "VEGAS",
    "aimd" : {
      "requestTimeout" : "PT1S",
      "initialLimit" : 100,
      "minLimit" : 1,
      "maxLimit" : 1000,
      "backoffRatio" : 0.9
    },
    "fixedLimit" : {
      "limit" : 20
    },
    "vegas" : {
      "alpha" : 3,
      "beta" : 6,
      "initialLimit" : 20
    },
    "gradient" : {
      "minLimit" : 10,
      "initialLimit" : 20,
      "rttTolerance" : 2.0
    },
    "gradient2" : {
      "minLimit" : 10,
      "initialLimit" : 20,
      "rttTolerance" : 2.0,
      "longWindow" : 600
    }
  },
  "stepTimeout" : "PT5M",
  "executionMetricsExporterEnabled" : false
}
2020-10-22 09:06:37.297 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-1 [1/10]: actor scheduler
2020-10-22 09:06:37.303 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-1 [2/10]: membership and replication protocol
2020-10-22 09:06:41.308 [] [http-nio-0.0.0.0-9600-exec-1] INFO  org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring DispatcherServlet 'dispatcherServlet'
2020-10-22 09:06:41.309 [] [http-nio-0.0.0.0-9600-exec-1] INFO  org.springframework.web.servlet.DispatcherServlet - Initializing Servlet 'dispatcherServlet'
2020-10-22 09:06:41.319 [] [http-nio-0.0.0.0-9600-exec-1] INFO  org.springframework.web.servlet.DispatcherServlet - Completed initialization in 10 ms
2020-10-22 09:06:44.826 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-1 [3/10]: command api transport
2020-10-22 09:06:45.315 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-1 [4/10]: command api handler
2020-10-22 09:06:45.727 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-1 [5/10]: subscription api
2020-10-22 09:06:45.806 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-1 [6/10]: cluster services
2020-10-22 09:06:47.977 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-1 [7/10]: topology manager
2020-10-22 09:06:47.979 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-1 [8/10]: monitoring services
2020-10-22 09:06:47.984 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-1 [9/10]: leader management request handler
2020-10-22 09:06:47.990 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-1 [10/10]: zeebe partitions
2020-10-22 09:06:47.993 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-1 partitions [1/3]: partition 3
2020-10-22 09:06:48.402 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-1 partitions [2/3]: partition 2
2020-10-22 09:06:48.427 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-1 partitions [3/3]: partition 1
2020-10-22 09:06:48.444 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-1 partitions succeeded. Started 3 steps in 451 ms.
2020-10-22 09:06:48.447 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-1 succeeded. Started 10 steps in 11151 ms.

Also, since you mentioned the memory problem, I entered server4 and ran top.

image

salaboy commented 4 years ago

@BigeYoung notice that Kubernetes might have memory available but if the requests and limits are requesting less of what the application inside the container needs, it will kill it anyways:

    Limits:
      cpu:     1
      memory:  4Gi
    Requests:
      cpu:      500m
      memory:   2Gi

These needs to be different if you run with:

helm install test-core zeebe/zeebe-full --values https://raw.githubusercontent.com/zeebe-io/zeebe-helm-profiles/master/zeebe-core-team.yaml

Notice that you can tweak the values if you download that file from the github repo:

https://raw.githubusercontent.com/zeebe-io/zeebe-helm-profiles/master/zeebe-core-team.yaml

Notice again, the requests and limits sections

salaboy commented 4 years ago

Notice that the file that I've pointed out is adding loads of memory and CPUs

  limits:
    cpu: 5
    memory: 12Gi
  requests:
    cpu: 5
    memory: 12Gi

You might need to tweak that to fit in your cluster resources

BigeYoung commented 4 years ago

Oh, it is super strange. I downloaded this yaml file, but no matter how I modify the value of resources, the pod limit will not change after helm install.

salaboy commented 4 years ago

I guess that the file is not correct then.. you can also change those values by using helm --set can you try that?

BigeYoung commented 4 years ago

I found that it was because zeebe/zeebe-full did not accept this value file, so I used chart zeebe-cluster instead. My complete installation command is:

helm install test-core zeebe/zeebe-cluster --values zeebe-core-team.yaml

And I changed the resource limits to:

# RESOURCES
resources:
  limits:
    cpu: 2
    memory: 6Gi
  requests:
    cpu: 1
    memory: 2Gi

I am very happy to see that resource limits have changed, but unfortunately, the crash continues.

➜  ~ kubectl describe pod test-core-zeebe-0
Name:         test-core-zeebe-0
Namespace:    default
Priority:     0
Node:         server4/192.168.137.124
Start Time:   Fri, 23 Oct 2020 14:25:38 +0800
Labels:       app.kubernetes.io/component=broker
              app.kubernetes.io/instance=test-core
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=zeebe-cluster
              controller-revision-hash=test-core-zeebe-87b8975cd
              statefulset.kubernetes.io/pod-name=test-core-zeebe-0
Annotations:  <none>
Status:       Running
IP:           10.244.1.242
IPs:
  IP:           10.244.1.242
Controlled By:  StatefulSet/test-core-zeebe
Containers:
  zeebe-cluster:
    Container ID:   docker://b75bb59b8c8b94509f954eb729b129a1bee65c458b3b0a77a0187502fc73660a
    Image:          camunda/zeebe:0.24.2
    Image ID:       docker-pullable://camunda/zeebe@sha256:795ace31c498ad4bc37b7b0fab612307c34852f4187766e3f777a509821c9fb3
    Ports:          9600/TCP, 26501/TCP, 26502/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Fri, 23 Oct 2020 14:46:03 +0800
      Finished:     Fri, 23 Oct 2020 14:47:01 +0800
    Ready:          False
    Restart Count:  8
    Limits:
      cpu:     2
      memory:  6Gi
    Requests:
      cpu:      1
      memory:   2Gi
    Readiness:  http-get http://:9600/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ZEEBE_BROKER_CLUSTER_CLUSTERNAME:                test-core-zeebe
      ZEEBE_LOG_LEVEL:
      ZEEBE_BROKER_CLUSTER_PARTITIONSCOUNT:            3
      ZEEBE_BROKER_CLUSTER_CLUSTERSIZE:                3
      ZEEBE_BROKER_CLUSTER_REPLICATIONFACTOR:          3
      ZEEBE_BROKER_THREADS_CPUTHREADCOUNT:             2
      ZEEBE_BROKER_THREADS_IOTHREADCOUNT:              2
      ZEEBE_BROKER_GATEWAY_ENABLE:                     false
      ZEEBE_BROKER_EXPORTERS_ELASTICSEARCH_CLASSNAME:  io.zeebe.exporter.ElasticsearchExporter
      ZEEBE_BROKER_EXPORTERS_ELASTICSEARCH_ARGS_URL:   http://elasticsearch-master:9200
      ZEEBE_BROKER_NETWORK_COMMANDAPI_PORT:            26501
      ZEEBE_BROKER_NETWORK_INTERNALAPI_PORT:           26502
      ZEEBE_BROKER_NETWORK_MONITORINGAPI_PORT:         9600
      K8S_POD_NAME:                                    test-core-zeebe-0 (v1:metadata.name)
      JAVA_TOOL_OPTIONS:                               -XX:MaxRAMPercentage=25.0 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/zeebe/data -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log -XX:+ExitOnOutOfMemoryError
    Mounts:
      /exporters from exporters (rw)
      /usr/local/bin/startup.sh from config (rw,path="startup.sh")
      /usr/local/zeebe/config/application.yaml from config (rw,path="application.yaml")
      /usr/local/zeebe/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-mt74h (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-test-core-zeebe-0
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      test-core-zeebe-cluster
    Optional:  false
  exporters:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  default-token-mt74h:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-mt74h
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  25m                default-scheduler  Successfully assigned default/test-core-zeebe-0 to server4
  Normal   Pulled     22m (x4 over 25m)  kubelet            Container image "camunda/zeebe:0.24.2" already present on machine
  Normal   Created    22m (x4 over 25m)  kubelet            Created container zeebe-cluster
  Normal   Started    22m (x4 over 25m)  kubelet            Started container zeebe-cluster
  Warning  Unhealthy  22m (x4 over 25m)  kubelet            Readiness probe failed: Get "http://10.244.1.240:9600/ready": dial tcp 10.244.1.240:9600: connect: connection refused
  Warning  Unhealthy  22m (x4 over 25m)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 503
  Warning  Unhealthy  21m                kubelet            Readiness probe failed: Get "http://10.244.1.240:9600/ready": read tcp 10.244.1.1:50878->10.244.1.240:9600: read: connection reset by peer
  Warning  BackOff    8s (x91 over 23m)  kubelet            Back-off restarting failed container
++ hostname -f
+ export ZEEBE_BROKER_NETWORK_ADVERTISEDHOST=test-core-zeebe-0.test-core-zeebe.default.svc.cluster.local
+ ZEEBE_BROKER_NETWORK_ADVERTISEDHOST=test-core-zeebe-0.test-core-zeebe.default.svc.cluster.local
+ export ZEEBE_BROKER_CLUSTER_NODEID=0
+ ZEEBE_BROKER_CLUSTER_NODEID=0
+ export ZEEBE_BROKER_CLUSTER_CLUSTERSIZE=3
+ ZEEBE_BROKER_CLUSTER_CLUSTERSIZE=3
+ contactPointPrefix=test-core-zeebe
+ contactPoints=
+ [[ -z '' ]]
+ (( i=0 ))
+ (( i<3 ))
++ hostname -d
+ contactPoints=,test-core-zeebe-0.test-core-zeebe.default.svc.cluster.local:26502
+ (( i++ ))
+ (( i<3 ))
++ hostname -d
+ contactPoints=,test-core-zeebe-0.test-core-zeebe.default.svc.cluster.local:26502,test-core-zeebe-1.test-core-zeebe.default.svc.cluster.local:26502
+ (( i++ ))
+ (( i<3 ))
++ hostname -d
+ contactPoints=,test-core-zeebe-0.test-core-zeebe.default.svc.cluster.local:26502,test-core-zeebe-1.test-core-zeebe.default.svc.cluster.local:26502,test-core-zeebe-2.test-core-zeebe.default.svc.cluster.local:26502
+ (( i++ ))
+ (( i<3 ))
+ export ZEEBE_BROKER_CLUSTER_INITIALCONTACTPOINTS=,test-core-zeebe-0.test-core-zeebe.default.svc.cluster.local:26502,test-core-zeebe-1.test-core-zeebe.default.svc.cluster.local:26502,test-core-zeebe-2.test-core-zeebe.default.svc.cluster.local:26502
+ ZEEBE_BROKER_CLUSTER_INITIALCONTACTPOINTS=,test-core-zeebe-0.test-core-zeebe.default.svc.cluster.local:26502,test-core-zeebe-1.test-core-zeebe.default.svc.cluster.local:26502,test-core-zeebe-2.test-core-zeebe.default.svc.cluster.local:26502
++ ls -A /exporters/
+ '[' '' ']'
No exporters available.
+ echo 'No exporters available.'
+ exec /usr/local/zeebe/bin/broker
Picked up JAVA_TOOL_OPTIONS: -XX:MaxRAMPercentage=25.0 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/zeebe/data -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log -XX:+ExitOnOutOfMemoryError
2020-10-23 06:46:05,618 main WARN Error while converting string [] to type [class org.apache.logging.log4j.Level]. Using default value [null]. java.lang.IllegalArgumentException: Unknown level constant [].
        at org.apache.logging.log4j.Level.valueOf(Level.java:320)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:288)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:284)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters.convert(TypeConverters.java:419)
        at org.apache.logging.log4j.core.config.plugins.visitors.AbstractPluginVisitor.convert(AbstractPluginVisitor.java:149)
        at org.apache.logging.log4j.core.config.plugins.visitors.PluginAttributeVisitor.visit(PluginAttributeVisitor.java:45)
        at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.generateParameters(PluginBuilder.java:258)
        at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.build(PluginBuilder.java:135)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createPluginObject(AbstractConfiguration.java:1002)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:942)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:934)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.doConfigure(AbstractConfiguration.java:552)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.initialize(AbstractConfiguration.java:241)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.start(AbstractConfiguration.java:288)
        at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:618)
        at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:691)
        at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:708)
        at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:263)
        at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:153)
        at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:45)
        at org.apache.logging.log4j.LogManager.getContext(LogManager.java:194)
        at org.apache.commons.logging.LogAdapter$Log4jLog.<clinit>(LogAdapter.java:155)
        at org.apache.commons.logging.LogAdapter$Log4jAdapter.createLog(LogAdapter.java:122)
        at org.apache.commons.logging.LogAdapter.createLog(LogAdapter.java:89)
        at org.apache.commons.logging.LogFactoryService.getInstance(LogFactoryService.java:46)
        at org.apache.commons.logging.LogFactoryService.getInstance(LogFactoryService.java:41)
        at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:655)
        at org.springframework.boot.SpringApplication.<clinit>(SpringApplication.java:196)
        at io.zeebe.broker.StandaloneBroker.main(StandaloneBroker.java:52)

2020-10-23 06:46:06,540 main WARN Error while converting string [] to type [class org.apache.logging.log4j.Level]. Using default value [null]. java.lang.IllegalArgumentException: Unknown level constant [].
        at org.apache.logging.log4j.Level.valueOf(Level.java:320)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:288)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:284)
        at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters.convert(TypeConverters.java:419)
        at org.apache.logging.log4j.core.config.plugins.visitors.AbstractPluginVisitor.convert(AbstractPluginVisitor.java:149)
        at org.apache.logging.log4j.core.config.plugins.visitors.PluginAttributeVisitor.visit(PluginAttributeVisitor.java:45)
        at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.generateParameters(PluginBuilder.java:258)
        at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.build(PluginBuilder.java:135)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createPluginObject(AbstractConfiguration.java:1002)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:942)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:934)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.doConfigure(AbstractConfiguration.java:552)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.initialize(AbstractConfiguration.java:241)
        at org.apache.logging.log4j.core.config.AbstractConfiguration.start(AbstractConfiguration.java:288)
        at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:618)
        at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:691)
        at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:708)
        at org.springframework.boot.logging.log4j2.Log4J2LoggingSystem.reinitialize(Log4J2LoggingSystem.java:204)
        at org.springframework.boot.logging.AbstractLoggingSystem.initializeWithConventions(AbstractLoggingSystem.java:73)
        at org.springframework.boot.logging.AbstractLoggingSystem.initialize(AbstractLoggingSystem.java:60)
        at org.springframework.boot.logging.log4j2.Log4J2LoggingSystem.initialize(Log4J2LoggingSystem.java:160)
        at org.springframework.boot.context.logging.LoggingApplicationListener.initializeSystem(LoggingApplicationListener.java:306)
        at org.springframework.boot.context.logging.LoggingApplicationListener.initialize(LoggingApplicationListener.java:281)
        at org.springframework.boot.context.logging.LoggingApplicationListener.onApplicationEnvironmentPreparedEvent(LoggingApplicationListener.java:239)
        at org.springframework.boot.context.logging.LoggingApplicationListener.onApplicationEvent(LoggingApplicationListener.java:216)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.doInvokeListener(SimpleApplicationEventMulticaster.java:172)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:165)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:139)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:127)
        at org.springframework.boot.context.event.EventPublishingRunListener.environmentPrepared(EventPublishingRunListener.java:80)
        at org.springframework.boot.SpringApplicationRunListeners.environmentPrepared(SpringApplicationRunListeners.java:53)
        at org.springframework.boot.SpringApplication.prepareEnvironment(SpringApplication.java:345)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:308)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1237)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1226)
        at io.zeebe.broker.StandaloneBroker.main(StandaloneBroker.java:52)

  ______  ______   ______   ____    ______     ____    _____     ____    _  __  ______   _____
 |___  / |  ____| |  ____| |  _ \  |  ____|   |  _ \  |  __ \   / __ \  | |/ / |  ____| |  __ \
    / /  | |__    | |__    | |_) | | |__      | |_) | | |__) | | |  | | | ' /  | |__    | |__) |
   / /   |  __|   |  __|   |  _ <  |  __|     |  _ <  |  _  /  | |  | | |  <   |  __|   |  _  /
  / /__  | |____  | |____  | |_) | | |____    | |_) | | | \ \  | |__| | | . \  | |____  | | \ \
 /_____| |______| |______| |____/  |______|   |____/  |_|  \_\  \____/  |_|\_\ |______| |_|  \_\

2020-10-23 06:46:06.744 [] [main] INFO  io.zeebe.broker.StandaloneBroker - Starting StandaloneBroker v0.24.2 on test-core-zeebe-0 with PID 6 (/usr/local/zeebe/lib/zeebe-distribution-0.24.2.jar started by root in /usr/local/zeebe)
2020-10-23 06:46:06.758 [] [main] INFO  io.zeebe.broker.StandaloneBroker - No active profile set, falling back to default profiles: default
2020-10-23 06:46:09.460 [] [main] INFO  org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat initialized with port(s): 9600 (http)
2020-10-23 06:46:09.492 [] [main] INFO  org.apache.coyote.http11.Http11NioProtocol - Initializing ProtocolHandler ["http-nio-0.0.0.0-9600"]
2020-10-23 06:46:09.494 [] [main] INFO  org.apache.catalina.core.StandardService - Starting service [Tomcat]
2020-10-23 06:46:09.494 [] [main] INFO  org.apache.catalina.core.StandardEngine - Starting Servlet engine: [Apache Tomcat/9.0.36]
2020-10-23 06:46:09.652 [] [main] INFO  org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring embedded WebApplicationContext
2020-10-23 06:46:09.653 [] [main] INFO  org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext - Root WebApplicationContext: initialization completed in 2791 ms
2020-10-23 06:46:10.225 [] [main] INFO  org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor - Initializing ExecutorService 'applicationTaskExecutor'
2020-10-23 06:46:10.680 [] [main] INFO  org.springframework.boot.actuate.endpoint.web.EndpointLinksResolver - Exposing 2 endpoint(s) beneath base path '/actuator'
2020-10-23 06:46:10.728 [] [main] INFO  org.apache.coyote.http11.Http11NioProtocol - Starting ProtocolHandler ["http-nio-0.0.0.0-9600"]
2020-10-23 06:46:10.765 [] [main] INFO  org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat started on port(s): 9600 (http) with context path ''
2020-10-23 06:46:10.785 [] [main] INFO  io.zeebe.broker.StandaloneBroker - Started StandaloneBroker in 4.854 seconds (JVM running for 6.941)
2020-10-23 06:46:10.912 [] [main] INFO  io.zeebe.broker.system - Version: 0.24.2
2020-10-23 06:46:10.997 [] [main] INFO  io.zeebe.broker.system - Starting broker 0 with configuration {
  "network" : {
    "host" : "0.0.0.0",
    "portOffset" : 0,
    "maxMessageSize" : "4MB",
    "advertisedHost" : "test-core-zeebe-0.test-core-zeebe.default.svc.cluster.local",
    "commandApi" : {
      "host" : "0.0.0.0",
      "port" : 26501,
      "advertisedHost" : "test-core-zeebe-0.test-core-zeebe.default.svc.cluster.local",
      "advertisedPort" : 26501,
      "address" : "0.0.0.0:26501",
      "advertisedAddress" : "test-core-zeebe-0.test-core-zeebe.default.svc.cluster.local:26501"
    },
    "internalApi" : {
      "host" : "0.0.0.0",
      "port" : 26502,
      "advertisedHost" : "test-core-zeebe-0.test-core-zeebe.default.svc.cluster.local",
      "advertisedPort" : 26502,
      "address" : "0.0.0.0:26502",
      "advertisedAddress" : "test-core-zeebe-0.test-core-zeebe.default.svc.cluster.local:26502"
    },
    "monitoringApi" : {
      "host" : "0.0.0.0",
      "port" : 9600,
      "advertisedHost" : "test-core-zeebe-0.test-core-zeebe.default.svc.cluster.local",
      "advertisedPort" : 9600,
      "address" : "0.0.0.0:9600",
      "advertisedAddress" : "test-core-zeebe-0.test-core-zeebe.default.svc.cluster.local:9600"
    },
    "maxMessageSizeInBytes" : 4194304
  },
  "cluster" : {
    "initialContactPoints" : [ "test-core-zeebe-0.test-core-zeebe.default.svc.cluster.local:26502", "test-core-zeebe-1.test-core-zeebe.default.svc.cluster.local:26502", "test-core-zeebe-2.test-core-zeebe.default.svc.cluster.local:26502" ],
    "partitionIds" : [ 1, 2, 3 ],
    "nodeId" : 0,
    "partitionsCount" : 3,
    "replicationFactor" : 3,
    "clusterSize" : 3,
    "clusterName" : "test-core-zeebe",
    "membership" : {
      "broadcastUpdates" : false,
      "broadcastDisputes" : true,
      "notifySuspect" : false,
      "gossipInterval" : "PT0.25S",
      "gossipFanout" : 2,
      "probeInterval" : "PT1S",
      "probeTimeout" : "PT2S",
      "suspectProbes" : 3,
      "failureTimeout" : "PT10S",
      "syncInterval" : "PT10S"
    }
  },
  "threads" : {
    "cpuThreadCount" : 2,
    "ioThreadCount" : 2
  },
  "data" : {
    "directories" : [ "/usr/local/zeebe/data" ],
    "logSegmentSize" : "512MB",
    "snapshotPeriod" : "PT15M",
    "logIndexDensity" : 100,
    "logSegmentSizeInBytes" : 536870912,
    "atomixStorageLevel" : "DISK"
  },
  "exporters" : {
    "elasticsearch" : {
      "jarPath" : null,
      "className" : "io.zeebe.exporter.ElasticsearchExporter",
      "args" : {
        "url" : "http://elasticsearch-master:9200"
      },
      "external" : false
    }
  },
  "gateway" : {
    "network" : {
      "host" : "0.0.0.0",
      "port" : 26500,
      "minKeepAliveInterval" : "PT30S"
    },
    "cluster" : {
      "contactPoint" : "0.0.0.0:26502",
      "requestTimeout" : "PT15S",
      "clusterName" : "zeebe-cluster",
      "memberId" : "gateway",
      "host" : "0.0.0.0",
      "port" : 26502,
      "membership" : {
        "broadcastUpdates" : false,
        "broadcastDisputes" : true,
        "notifySuspect" : false,
        "gossipInterval" : "PT0.25S",
        "gossipFanout" : 2,
        "probeInterval" : "PT1S",
        "probeTimeout" : "PT2S",
        "suspectProbes" : 3,
        "failureTimeout" : "PT10S",
        "syncInterval" : "PT10S"
      }
    },
    "threads" : {
      "managementThreads" : 1
    },
    "monitoring" : {
      "enabled" : false,
      "host" : "0.0.0.0",
      "port" : 9600
    },
    "security" : {
      "enabled" : false,
      "certificateChainPath" : null,
      "privateKeyPath" : null
    },
    "longPolling" : {
      "enabled" : true
    },
    "initialized" : true,
    "enable" : false
  },
  "backpressure" : {
    "enabled" : true,
    "algorithm" : "VEGAS",
    "aimd" : {
      "requestTimeout" : "PT1S",
      "initialLimit" : 100,
      "minLimit" : 1,
      "maxLimit" : 1000,
      "backoffRatio" : 0.9
    },
    "fixedLimit" : {
      "limit" : 20
    },
    "vegas" : {
      "alpha" : 3,
      "beta" : 6,
      "initialLimit" : 20
    },
    "gradient" : {
      "minLimit" : 10,
      "initialLimit" : 20,
      "rttTolerance" : 2.0
    },
    "gradient2" : {
      "minLimit" : 10,
      "initialLimit" : 20,
      "rttTolerance" : 2.0,
      "longWindow" : 600
    }
  },
  "stepTimeout" : "PT5M",
  "executionMetricsExporterEnabled" : false
}
2020-10-23 06:46:11.025 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [1/10]: actor scheduler
2020-10-23 06:46:11.028 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [2/10]: membership and replication protocol
2020-10-23 06:46:12.014 [] [http-nio-0.0.0.0-9600-exec-1] INFO  org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring DispatcherServlet 'dispatcherServlet'
2020-10-23 06:46:12.020 [] [http-nio-0.0.0.0-9600-exec-1] INFO  org.springframework.web.servlet.DispatcherServlet - Initializing Servlet 'dispatcherServlet'
2020-10-23 06:46:12.051 [] [http-nio-0.0.0.0-9600-exec-1] INFO  org.springframework.web.servlet.DispatcherServlet - Completed initialization in 30 ms
2020-10-23 06:46:14.746 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [3/10]: command api transport
2020-10-23 06:46:15.002 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [4/10]: command api handler
2020-10-23 06:46:15.053 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [5/10]: subscription api
2020-10-23 06:46:15.097 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [6/10]: cluster services
2020-10-23 06:46:16.736 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [7/10]: topology manager
2020-10-23 06:46:16.737 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [8/10]: monitoring services
2020-10-23 06:46:16.742 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [9/10]: leader management request handler
2020-10-23 06:46:16.746 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [10/10]: zeebe partitions
2020-10-23 06:46:16.750 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 partitions [1/3]: partition 3
2020-10-23 06:46:17.027 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 partitions [2/3]: partition 2
2020-10-23 06:46:17.046 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 partitions [3/3]: partition 1
2020-10-23 06:46:17.066 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 partitions succeeded. Started 3 steps in 316 ms.
2020-10-23 06:46:17.067 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 succeeded. Started 10 steps in 6044 ms.
salaboy commented 4 years ago

@BigeYoung thanks for trying all these out and it was my mistake on the yaml file, you can change that file to be used with the full chart but I will manage to update that file so it doesn't cause problems. I notice that you still are requesting 2gb of memory, so even if the limits are higher the JVM might be chocking up on memory (hence you still getting OOM with error 137). Can you try

resources:
  limits:
    cpu: 2
    memory: 6Gi
  requests:
    cpu: 1
    memory: 3Gi

? I will create a cluster today on GKE to double check and see if I can reproduce your error. I just need to finish some stuff first

salaboy commented 4 years ago

@BigeYoung can you share your cluster details? Kubernetes version and size (type of nodes and amount)? That is something that might help me to replicate the issue faster, to make sure that we have similar setups.

BigeYoung commented 4 years ago

I have 4 nodes, named server1, 2, 3, 4. All running CentOS 7 and Kubernetes v1.19.2. Among them, server1 is set as the master node. server1, 2, 3 are all Dell PowerEdge R240, CPUs are 4-core Intel(R) Xeon(R) E-2224 CPU @ 3.40GHz, both of them have 32GB memory and 2TB hard drive. server4 is an Inspur server with 16GB memory, 2TB hard disk, and 8 Intel(R) Xeon(R) CPU E5-2609 v2 @ 2.50GHz.

The full report is as follows:

➜  ~ kubectl describe nodes
Name:               server1
Roles:              master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=server1
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/master=
Annotations:        flannel.alpha.coreos.com/backend-data: {"VtepMAC":"de:34:67:a5:c3:e4"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 116.57.98.121
                    kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sun, 27 Sep 2020 11:32:48 +0800
Taints:             node-role.kubernetes.io/master:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  server1
  AcquireTime:     <unset>
  RenewTime:       Sat, 24 Oct 2020 15:44:19 +0800
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Sun, 18 Oct 2020 21:21:21 +0800   Sun, 18 Oct 2020 21:21:21 +0800   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Sat, 24 Oct 2020 15:43:59 +0800   Sun, 27 Sep 2020 11:32:46 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Sat, 24 Oct 2020 15:43:59 +0800   Sun, 27 Sep 2020 11:32:46 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Sat, 24 Oct 2020 15:43:59 +0800   Sun, 27 Sep 2020 11:32:46 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Sat, 24 Oct 2020 15:43:59 +0800   Sun, 27 Sep 2020 14:40:54 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.137.121
  Hostname:    server1
Capacity:
  cpu:                4
  ephemeral-storage:  51175Mi
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32623028Ki
  pods:               110
Allocatable:
  cpu:                4
  ephemeral-storage:  48294789041
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32520628Ki
  pods:               110
System Info:
  Machine ID:                 00ceaf5e24814d1cacd6469737795c28
  System UUID:                4C4C4544-005A-3910-8043-B9C04F313433
  Boot ID:                    498a7d84-cd76-46e9-bac1-e90ace4974d2
  Kernel Version:             3.10.0-1127.19.1.el7.x86_64
  OS Image:                   CentOS Linux 7 (Core)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://19.3.13
  Kubelet Version:            v1.19.2
  Kube-Proxy Version:         v1.19.2
PodCIDR:                      10.244.0.0/24
PodCIDRs:                     10.244.0.0/24
Non-terminated Pods:          (6 in total)
  Namespace                   Name                               CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                               ------------  ----------  ---------------  -------------  ---
  kube-system                 etcd-server1                       0 (0%)        0 (0%)      0 (0%)           0 (0%)         27d
  kube-system                 kube-apiserver-server1             250m (6%)     0 (0%)      0 (0%)           0 (0%)         27d
  kube-system                 kube-controller-manager-server1    200m (5%)     0 (0%)      0 (0%)           0 (0%)         27d
  kube-system                 kube-flannel-ds-hz74p              100m (2%)     100m (2%)   50Mi (0%)        50Mi (0%)      27d
  kube-system                 kube-proxy-mb4bm                   0 (0%)        0 (0%)      0 (0%)           0 (0%)         27d
  kube-system                 kube-scheduler-server1             100m (2%)     0 (0%)      0 (0%)           0 (0%)         27d
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                650m (16%)  100m (2%)
  memory             50Mi (0%)   50Mi (0%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:              <none>

Name:               server2
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=server2
                    kubernetes.io/os=linux
Annotations:        flannel.alpha.coreos.com/backend-data: {"VtepMAC":"da:18:2a:3c:2c:62"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 116.57.98.122
                    kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sun, 27 Sep 2020 11:35:05 +0800
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  server2
  AcquireTime:     <unset>
  RenewTime:       Sat, 24 Oct 2020 15:44:26 +0800
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Sun, 27 Sep 2020 19:33:47 +0800   Sun, 27 Sep 2020 19:33:47 +0800   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Sat, 24 Oct 2020 15:40:07 +0800   Sun, 27 Sep 2020 19:33:30 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Sat, 24 Oct 2020 15:40:07 +0800   Sun, 27 Sep 2020 19:33:30 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Sat, 24 Oct 2020 15:40:07 +0800   Sun, 27 Sep 2020 19:33:30 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Sat, 24 Oct 2020 15:40:07 +0800   Sun, 27 Sep 2020 19:33:30 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.137.122
  Hostname:    server2
Capacity:
  cpu:                4
  ephemeral-storage:  51175Mi
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32623028Ki
  pods:               110
Allocatable:
  cpu:                4
  ephemeral-storage:  48294789041
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32520628Ki
  pods:               110
System Info:
  Machine ID:                 67e1b7e8e01544c9a79128266c5a155d
  System UUID:                4C4C4544-0043-3010-8042-B1C04F313433
  Boot ID:                    53219857-228a-45d8-b711-db8e0b967b10
  Kernel Version:             3.10.0-1127.el7.x86_64
  OS Image:                   CentOS Linux 7 (Core)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://19.3.13
  Kubelet Version:            v1.19.2
  Kube-Proxy Version:         v1.19.2
PodCIDR:                      10.244.2.0/24
PodCIDRs:                     10.244.2.0/24
Non-terminated Pods:          (8 in total)
  Namespace                   Name                                      CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                                      ------------  ----------  ---------------  -------------  ---
  default                     consul-consul-server-1                    100m (2%)     100m (2%)   100Mi (0%)       100Mi (0%)     6d2h
  default                     consul-consul-zcsjv                       100m (2%)     100m (2%)   100Mi (0%)       100Mi (0%)     6d2h
  default                     cpps-product-client-1-9cccddfc6-w5zjl     0 (0%)        0 (0%)      0 (0%)           0 (0%)         18d
  default                     cpps-product-client-3-6c87b55479-jbkz8    0 (0%)        0 (0%)      0 (0%)           0 (0%)         17d
  default                     mqtt-mosquitto-5f6dc7c898-7vq9j           0 (0%)        0 (0%)      0 (0%)           0 (0%)         27d
  kube-system                 coredns-f9fd979d6-ktpcj                   100m (2%)     0 (0%)      70Mi (0%)        170Mi (0%)     27d
  kube-system                 kube-flannel-ds-4ff2d                     100m (2%)     100m (2%)   50Mi (0%)        50Mi (0%)      27d
  kube-system                 kube-proxy-l865g                          0 (0%)        0 (0%)      0 (0%)           0 (0%)         27d
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                400m (10%)  300m (7%)
  memory             320Mi (1%)  420Mi (1%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:              <none>

Name:               server3
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=server3
                    kubernetes.io/os=linux
Annotations:        flannel.alpha.coreos.com/backend-data: {"VtepMAC":"1e:8c:80:2b:ed:bb"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 116.57.98.123
                    kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sun, 27 Sep 2020 11:35:10 +0800
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  server3
  AcquireTime:     <unset>
  RenewTime:       Sat, 24 Oct 2020 15:44:28 +0800
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Mon, 28 Sep 2020 14:04:39 +0800   Mon, 28 Sep 2020 14:04:39 +0800   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Sat, 24 Oct 2020 15:41:20 +0800   Mon, 28 Sep 2020 14:04:26 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Sat, 24 Oct 2020 15:41:20 +0800   Mon, 28 Sep 2020 14:04:26 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Sat, 24 Oct 2020 15:41:20 +0800   Mon, 28 Sep 2020 14:04:26 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Sat, 24 Oct 2020 15:41:20 +0800   Mon, 28 Sep 2020 14:04:26 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.137.123
  Hostname:    server3
Capacity:
  cpu:                4
  ephemeral-storage:  51175Mi
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32623028Ki
  pods:               110
Allocatable:
  cpu:                4
  ephemeral-storage:  48294789041
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32520628Ki
  pods:               110
System Info:
  Machine ID:                 49a477c8d4f1499ab9141a053ea40bca
  System UUID:                4C4C4544-005A-3710-8048-B9C04F313433
  Boot ID:                    bdd799e9-ff66-4ba8-8b21-450de318541c
  Kernel Version:             3.10.0-1127.19.1.el7.x86_64
  OS Image:                   CentOS Linux 7 (Core)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://19.3.13
  Kubelet Version:            v1.19.2
  Kube-Proxy Version:         v1.19.2
PodCIDR:                      10.244.3.0/24
PodCIDRs:                     10.244.3.0/24
Non-terminated Pods:          (10 in total)
  Namespace                   Name                                                               CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                                                               ------------  ----------  ---------------  -------------  ---
  default                     consul-consul-connect-injector-webhook-deployment-5c589b88xp22t    50m (1%)      50m (1%)    50Mi (0%)        50Mi (0%)      6d2h
  default                     consul-consul-server-0                                             100m (2%)     100m (2%)   100Mi (0%)       100Mi (0%)     6d2h
  default                     consul-consul-sync-catalog-68b75f5cf-27zwk                         50m (1%)      50m (1%)    50Mi (0%)        50Mi (0%)      6d2h
  default                     consul-consul-w97pz                                                100m (2%)     100m (2%)   100Mi (0%)       100Mi (0%)     6d2h
  default                     mariadb-0                                                          0 (0%)        0 (0%)      0 (0%)           0 (0%)         5d17h
  default                     minio-7cffb45794-gr4h5                                             0 (0%)        0 (0%)      4Gi (12%)        0 (0%)         26d
  default                     minio-zeebe-bd556b68f-mmbn5                                        0 (0%)        0 (0%)      0 (0%)           0 (0%)         26d
  kube-system                 kube-flannel-ds-6s9nl                                              100m (2%)     100m (2%)   50Mi (0%)        50Mi (0%)      27d
  kube-system                 kube-proxy-zj7fh                                                   0 (0%)        0 (0%)      0 (0%)           0 (0%)         27d
  ua-client                   ua-client-launcher-7b67f56558-h4s72                                0 (0%)        0 (0%)      0 (0%)           0 (0%)         19d
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                400m (10%)    400m (10%)
  memory             4446Mi (13%)  350Mi (1%)
  ephemeral-storage  0 (0%)        0 (0%)
  hugepages-1Gi      0 (0%)        0 (0%)
  hugepages-2Mi      0 (0%)        0 (0%)
Events:              <none>

Name:               server4
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=server4
                    kubernetes.io/os=linux
Annotations:        flannel.alpha.coreos.com/backend-data: {"VtepMAC":"86:0d:d9:f4:f7:3d"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 116.57.98.124
                    kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sun, 27 Sep 2020 11:35:03 +0800
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  server4
  AcquireTime:     <unset>
  RenewTime:       Sat, 24 Oct 2020 15:44:29 +0800
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Sat, 24 Oct 2020 02:58:22 +0800   Sat, 24 Oct 2020 02:58:22 +0800   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Sat, 24 Oct 2020 15:40:55 +0800   Tue, 20 Oct 2020 16:05:27 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Sat, 24 Oct 2020 15:40:55 +0800   Tue, 20 Oct 2020 16:05:27 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Sat, 24 Oct 2020 15:40:55 +0800   Tue, 20 Oct 2020 16:05:27 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Sat, 24 Oct 2020 15:40:55 +0800   Tue, 20 Oct 2020 16:05:27 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.137.124
  Hostname:    server4
Capacity:
  cpu:                8
  ephemeral-storage:  51175Mi
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16235012Ki
  pods:               110
Allocatable:
  cpu:                8
  ephemeral-storage:  48294789041
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16132612Ki
  pods:               110
System Info:
  Machine ID:                 472219fb3dcf4a31be3d3913de539489
  System UUID:                472219fb3dcf4a31be3d3913de539489
  Boot ID:                    15ad69df-abd5-45e8-ae22-e54a6bd37661
  Kernel Version:             3.10.0-1127.19.1.el7.x86_64
  OS Image:                   CentOS Linux 7 (Core)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://19.3.13
  Kubelet Version:            v1.19.2
  Kube-Proxy Version:         v1.19.2
PodCIDR:                      10.244.1.0/24
PodCIDRs:                     10.244.1.0/24
Non-terminated Pods:          (11 in total)
  Namespace                   Name                                              CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                                              ------------  ----------  ---------------  -------------  ---
  default                     aml2owl-f5966bd64-7z22g                           0 (0%)        0 (0%)      0 (0%)           0 (0%)         26d
  default                     consul-consul-svhkq                               100m (1%)     100m (1%)   100Mi (0%)       100Mi (0%)     6d2h
  default                     dashboard-kubernetes-dashboard-b5944fc7c-2v8zd    100m (1%)     2 (25%)     200Mi (1%)       200Mi (1%)     27d
  default                     nfs-nfs-client-provisioner-74bbbb5bf8-nd8cz       0 (0%)        0 (0%)      0 (0%)           0 (0%)         24d
  default                     orderpage-backend-669b57b7fc-fcbbm                0 (0%)        0 (0%)      0 (0%)           0 (0%)         4d1h
  default                     orderpage-frontend-655cdbd6c-x7cxf                0 (0%)        0 (0%)      0 (0%)           0 (0%)         5d1h
  default                     scut100-9d6d68669-vp2kc                           0 (0%)        0 (0%)      0 (0%)           0 (0%)         26d
  ingress-nginx               ingress-nginx-controller-d7f9d68cf-np9xk          100m (1%)     0 (0%)      90Mi (0%)        0 (0%)         5d1h
  kube-system                 coredns-f9fd979d6-8qtjk                           100m (1%)     0 (0%)      70Mi (0%)        170Mi (1%)     27d
  kube-system                 kube-flannel-ds-tw9b5                             100m (1%)     100m (1%)   50Mi (0%)        50Mi (0%)      27d
  kube-system                 kube-proxy-9nkbd                                  0 (0%)        0 (0%)      0 (0%)           0 (0%)         27d
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                500m (6%)   2200m (27%)
  memory             510Mi (3%)  520Mi (3%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:              <none>
BigeYoung commented 4 years ago

@BigeYoung thanks for trying all these out and it was my mistake on the yaml file, you can change that file to be used with the full chart but I will manage to update that file so it doesn't cause problems. I notice that you still are requesting 2gb of memory, so even if the limits are higher the JVM might be chocking up on memory (hence you still getting OOM with error 137). Can you try

resources:
  limits:
    cpu: 2
    memory: 6Gi
  requests:
    cpu: 1
    memory: 3Gi

? I will create a cluster today on GKE to double check and see if I can reproduce your error. I just need to finish some stuff first

Yes, and here's the result.

➜  ~ kubectl describe pods test-core-zeebe-0
Name:         test-core-zeebe-0
Namespace:    default
Priority:     0
Node:         server4/192.168.137.124
Start Time:   Sun, 25 Oct 2020 13:52:41 +0800
Labels:       app.kubernetes.io/component=broker
              app.kubernetes.io/instance=test-core
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=zeebe-cluster
              controller-revision-hash=test-core-zeebe-7c68646d8f
              statefulset.kubernetes.io/pod-name=test-core-zeebe-0
Annotations:  <none>
Status:       Running
IP:           10.244.1.13
IPs:
  IP:           10.244.1.13
Controlled By:  StatefulSet/test-core-zeebe
Containers:
  zeebe-cluster:
    Container ID:   docker://7aa201fa9d321354411a2b37e45f4b59265c029c7257e6cc93c28aad8036b276
    Image:          camunda/zeebe:0.24.2
    Image ID:       docker-pullable://camunda/zeebe@sha256:795ace31c498ad4bc37b7b0fab612307c34852f4187766e3f777a509821c9fb3
    Ports:          9600/TCP, 26501/TCP, 26502/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Sun, 25 Oct 2020 13:54:14 +0800
      Finished:     Sun, 25 Oct 2020 13:55:02 +0800
    Ready:          False
    Restart Count:  2
    Limits:
      cpu:     3
      memory:  6Gi
    Requests:
      cpu:      2
      memory:   3Gi
    Readiness:  http-get http://:9600/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ZEEBE_BROKER_CLUSTER_CLUSTERNAME:                test-core-zeebe
      ZEEBE_LOG_LEVEL:
      ZEEBE_BROKER_CLUSTER_PARTITIONSCOUNT:            3
      ZEEBE_BROKER_CLUSTER_CLUSTERSIZE:                3
      ZEEBE_BROKER_CLUSTER_REPLICATIONFACTOR:          3
      ZEEBE_BROKER_THREADS_CPUTHREADCOUNT:             2
      ZEEBE_BROKER_THREADS_IOTHREADCOUNT:              2
      ZEEBE_BROKER_GATEWAY_ENABLE:                     false
      ZEEBE_BROKER_EXPORTERS_ELASTICSEARCH_CLASSNAME:  io.zeebe.exporter.ElasticsearchExporter
      ZEEBE_BROKER_EXPORTERS_ELASTICSEARCH_ARGS_URL:   http://elasticsearch-master:9200
      ZEEBE_BROKER_NETWORK_COMMANDAPI_PORT:            26501
      ZEEBE_BROKER_NETWORK_INTERNALAPI_PORT:           26502
      ZEEBE_BROKER_NETWORK_MONITORINGAPI_PORT:         9600
      K8S_POD_NAME:                                    test-core-zeebe-0 (v1:metadata.name)
      JAVA_TOOL_OPTIONS:                               -XX:MaxRAMPercentage=25.0 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/zeebe/data -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log -XX:+ExitOnOutOfMemoryError
    Mounts:
      /exporters from exporters (rw)
      /usr/local/bin/startup.sh from config (rw,path="startup.sh")
      /usr/local/zeebe/config/application.yaml from config (rw,path="application.yaml")
      /usr/local/zeebe/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-mt74h (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-test-core-zeebe-0
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      test-core-zeebe-cluster
    Optional:  false
  exporters:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  default-token-mt74h:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-mt74h
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  2m43s                default-scheduler  0/4 nodes are available: 4 pod has unbound immediate PersistentVolumeClaims.
  Normal   Scheduled         2m43s                default-scheduler  Successfully assigned default/test-core-zeebe-0 to server4
  Warning  Unhealthy         2m26s                kubelet            Readiness probe failed: HTTP probe failed with statuscode: 503
  Normal   Pulled            70s (x3 over 2m41s)  kubelet            Container image "camunda/zeebe:0.24.2" already present on machine
  Normal   Created           70s (x3 over 2m41s)  kubelet            Created container zeebe-cluster
  Normal   Started           70s (x3 over 2m41s)  kubelet            Started container zeebe-cluster
  Warning  Unhealthy         66s (x3 over 2m36s)  kubelet            Readiness probe failed: Get "http://10.244.1.13:9600/ready": dial tcp 10.244.1.13:9600: connect: connection refused
  Warning  BackOff           10s (x3 over 82s)    kubelet            Back-off restarting failed container
salaboy commented 4 years ago

@BigeYoung thanks a lot for all these details, a couple of more questions from my side: 1) Is this GKE or hosted on Prem? 2) Are you using Istio or any other service mesh?

BigeYoung commented 4 years ago

@BigeYoung thanks a lot for all these details, a couple of more questions from my side:

  1. Is this GKE or hosted on Prem?
  2. Are you using Istio or any other service mesh?
  1. This is not GKE or Prem (this is the first time I have heard of these terms), this is just a normal cluster running on a local server.
  2. I installed Consul but haven’t done actual work yet.
salaboy commented 4 years ago

@BigeYoung that is what I was afraid.. because it is an "On-Premise" cluster, it is pretty difficult for us to reproduce. It can be a bunch of things going wrong, like networking or storage. I can see that you are using flannel there.. so again, it is pretty difficult for me to replicate.

Can you run other workloads in that cluster? You mention consul, why is that failing?

BigeYoung commented 4 years ago

@BigeYoung that is what I was afraid.. because it is an "On-Premise" cluster, it is pretty difficult for us to reproduce. It can be a bunch of things going wrong, like networking or storage. I can see that you are using flannel there.. so again, it is pretty difficult for me to replicate.

Can you run other workloads in that cluster? You mention consul, why is that failing?

Oh I'm sorry that I didn't make myself clear, so you misunderstood me... All the other workloads running on the cluster are running very well, including Consul. I mean, I just put Consul on the cluster for testing, and haven't used it to connect with other loads.

BigeYoung commented 4 years ago

I tried to start it directly using Docker. Strangely, it doesn't print any errors, but it stops running after the boot step.

USING DOCKER

[bige@server4 ~]$ sudo docker run camunda/zeebe:0.24.2
++ hostname -i
+ export ZEEBE_HOST=172.17.0.2
+ ZEEBE_HOST=172.17.0.2
+ '[' false = true ']'
+ export ZEEBE_BROKER_NETWORK_HOST=172.17.0.2
+ ZEEBE_BROKER_NETWORK_HOST=172.17.0.2
+ export ZEEBE_BROKER_GATEWAY_CLUSTER_HOST=172.17.0.2
+ ZEEBE_BROKER_GATEWAY_CLUSTER_HOST=172.17.0.2
+ exec /usr/local/zeebe/bin/broker
  ______  ______   ______   ____    ______     ____    _____     ____    _  __  ______   _____
 |___  / |  ____| |  ____| |  _ \  |  ____|   |  _ \  |  __ \   / __ \  | |/ / |  ____| |  __ \
    / /  | |__    | |__    | |_) | | |__      | |_) | | |__) | | |  | | | ' /  | |__    | |__) |
   / /   |  __|   |  __|   |  _ <  |  __|     |  _ <  |  _  /  | |  | | |  <   |  __|   |  _  /
  / /__  | |____  | |____  | |_) | | |____    | |_) | | | \ \  | |__| | | . \  | |____  | | \ \
 /_____| |______| |______| |____/  |______|   |____/  |_|  \_\  \____/  |_|\_\ |______| |_|  \_\

2020-10-27 01:28:13.566 [] [main] INFO  io.zeebe.broker.StandaloneBroker - Starting StandaloneBroker v0.24.2 on 66e2d4590f6f with PID 6 (/usr/local/zeebe/lib/zeebe-distribution-0.24.2.jar started by root in /usr/local/zeebe)
2020-10-27 01:28:13.584 [] [main] INFO  io.zeebe.broker.StandaloneBroker - No active profile set, falling back to default profiles: default
2020-10-27 01:28:19.038 [] [main] INFO  org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat initialized with port(s): 9600 (http)
2020-10-27 01:28:19.093 [] [main] INFO  org.apache.coyote.http11.Http11NioProtocol - Initializing ProtocolHandler ["http-nio-172.17.0.2-9600"]
2020-10-27 01:28:19.095 [] [main] INFO  org.apache.catalina.core.StandardService - Starting service [Tomcat]
2020-10-27 01:28:19.096 [] [main] INFO  org.apache.catalina.core.StandardEngine - Starting Servlet engine: [Apache Tomcat/9.0.36]
2020-10-27 01:28:19.379 [] [main] INFO  org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring embedded WebApplicationContext
2020-10-27 01:28:19.380 [] [main] INFO  org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext - Root WebApplicationContext: initialization completed in 5645 ms
2020-10-27 01:28:20.558 [] [main] INFO  org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor - Initializing ExecutorService 'applicationTaskExecutor'
2020-10-27 01:28:21.282 [] [main] INFO  org.springframework.boot.actuate.endpoint.web.EndpointLinksResolver - Exposing 2 endpoint(s) beneath base path '/actuator'
2020-10-27 01:28:21.373 [] [main] INFO  org.apache.coyote.http11.Http11NioProtocol - Starting ProtocolHandler ["http-nio-172.17.0.2-9600"]
2020-10-27 01:28:21.565 [] [main] INFO  org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat started on port(s): 9600 (http) with context path ''
2020-10-27 01:28:21.624 [] [main] INFO  io.zeebe.broker.StandaloneBroker - Started StandaloneBroker in 9.67 seconds (JVM running for 13.319)
2020-10-27 01:28:21.813 [] [main] INFO  io.zeebe.broker.system - Version: 0.24.2
2020-10-27 01:28:21.940 [] [main] INFO  io.zeebe.broker.system - Starting broker 0 with configuration {
  "network" : {
    "host" : "172.17.0.2",
    "portOffset" : 0,
    "maxMessageSize" : "4MB",
    "advertisedHost" : "172.17.0.2",
    "commandApi" : {
      "host" : "172.17.0.2",
      "port" : 26501,
      "advertisedHost" : "172.17.0.2",
      "advertisedPort" : 26501,
      "advertisedAddress" : "172.17.0.2:26501",
      "address" : "172.17.0.2:26501"
    },
    "internalApi" : {
      "host" : "172.17.0.2",
      "port" : 26502,
      "advertisedHost" : "172.17.0.2",
      "advertisedPort" : 26502,
      "advertisedAddress" : "172.17.0.2:26502",
      "address" : "172.17.0.2:26502"
    },
    "monitoringApi" : {
      "host" : "172.17.0.2",
      "port" : 9600,
      "advertisedHost" : "172.17.0.2",
      "advertisedPort" : 9600,
      "advertisedAddress" : "172.17.0.2:9600",
      "address" : "172.17.0.2:9600"
    },
    "maxMessageSizeInBytes" : 4194304
  },
  "cluster" : {
    "initialContactPoints" : [ ],
    "partitionIds" : [ 1 ],
    "nodeId" : 0,
    "partitionsCount" : 1,
    "replicationFactor" : 1,
    "clusterSize" : 1,
    "clusterName" : "zeebe-cluster",
    "membership" : {
      "broadcastUpdates" : false,
      "broadcastDisputes" : true,
      "notifySuspect" : false,
      "gossipInterval" : "PT0.25S",
      "gossipFanout" : 2,
      "probeInterval" : "PT1S",
      "probeTimeout" : "PT2S",
      "suspectProbes" : 3,
      "failureTimeout" : "PT10S",
      "syncInterval" : "PT10S"
    }
  },
  "threads" : {
    "cpuThreadCount" : 2,
    "ioThreadCount" : 2
  },
  "data" : {
    "directories" : [ "/usr/local/zeebe/data" ],
    "logSegmentSize" : "512MB",
    "snapshotPeriod" : "PT15M",
    "logIndexDensity" : 100,
    "logSegmentSizeInBytes" : 536870912,
    "atomixStorageLevel" : "DISK"
  },
  "exporters" : { },
  "gateway" : {
    "network" : {
      "host" : "0.0.0.0",
      "port" : 26500,
      "minKeepAliveInterval" : "PT30S"
    },
    "cluster" : {
      "contactPoint" : "172.17.0.2:26502",
      "requestTimeout" : "PT15S",
      "clusterName" : "zeebe-cluster",
      "memberId" : "gateway",
      "host" : "172.17.0.2",
      "port" : 26502,
      "membership" : {
        "broadcastUpdates" : false,
        "broadcastDisputes" : true,
        "notifySuspect" : false,
        "gossipInterval" : "PT0.25S",
        "gossipFanout" : 2,
        "probeInterval" : "PT1S",
        "probeTimeout" : "PT2S",
        "suspectProbes" : 3,
        "failureTimeout" : "PT10S",
        "syncInterval" : "PT10S"
      }
    },
    "threads" : {
      "managementThreads" : 1
    },
    "monitoring" : {
      "enabled" : false,
      "host" : "172.17.0.2",
      "port" : 9600
    },
    "security" : {
      "enabled" : false,
      "certificateChainPath" : null,
      "privateKeyPath" : null
    },
    "longPolling" : {
      "enabled" : true
    },
    "initialized" : true,
    "enable" : true
  },
  "backpressure" : {
    "enabled" : true,
    "algorithm" : "VEGAS",
    "aimd" : {
      "requestTimeout" : "PT1S",
      "initialLimit" : 100,
      "minLimit" : 1,
      "maxLimit" : 1000,
      "backoffRatio" : 0.9
    },
    "fixedLimit" : {
      "limit" : 20
    },
    "vegas" : {
      "alpha" : 3,
      "beta" : 6,
      "initialLimit" : 20
    },
    "gradient" : {
      "minLimit" : 10,
      "initialLimit" : 20,
      "rttTolerance" : 2.0
    },
    "gradient2" : {
      "minLimit" : 10,
      "initialLimit" : 20,
      "rttTolerance" : 2.0,
      "longWindow" : 600
    }
  },
  "stepTimeout" : "PT5M",
  "executionMetricsExporterEnabled" : false
}
2020-10-27 01:28:21.992 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [1/11]: actor scheduler
2020-10-27 01:28:22.028 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [2/11]: membership and replication protocol
2020-10-27 01:28:26.489 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [3/11]: command api transport
2020-10-27 01:28:27.022 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [4/11]: command api handler
2020-10-27 01:28:27.136 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [5/11]: subscription api
2020-10-27 01:28:27.221 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [6/11]: embedded gateway
2020-10-27 01:28:27.233 [] [main] INFO  io.zeebe.gateway - Version: 0.24.2
2020-10-27 01:28:27.239 [] [main] INFO  io.zeebe.gateway - Starting gateway with configuration {
  "network" : {
    "host" : "0.0.0.0",
    "port" : 26500,
    "minKeepAliveInterval" : "PT30S"
  },
  "cluster" : {
    "contactPoint" : "172.17.0.2:26502",
    "requestTimeout" : "PT15S",
    "clusterName" : "zeebe-cluster",
    "memberId" : "gateway",
    "host" : "172.17.0.2",
    "port" : 26502,
    "membership" : {
      "broadcastUpdates" : false,
      "broadcastDisputes" : true,
      "notifySuspect" : false,
      "gossipInterval" : "PT0.25S",
      "gossipFanout" : 2,
      "probeInterval" : "PT1S",
      "probeTimeout" : "PT2S",
      "suspectProbes" : 3,
      "failureTimeout" : "PT10S",
      "syncInterval" : "PT10S"
    }
  },
  "threads" : {
    "managementThreads" : 1
  },
  "monitoring" : {
    "enabled" : false,
    "host" : "172.17.0.2",
    "port" : 9600
  },
  "security" : {
    "enabled" : false,
    "certificateChainPath" : null,
    "privateKeyPath" : null
  },
  "longPolling" : {
    "enabled" : true
  },
  "initialized" : true,
  "enable" : true
}
2020-10-27 01:28:27.629 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [7/11]: cluster services
2020-10-27 01:28:28.195 [] [raft-server-0-raft-partition-partition-1] WARN  io.atomix.utils.event.ListenerRegistry - Listener io.atomix.raft.roles.FollowerRole$$Lambda$946/0x000000080079c840@764b03b6 not registered
2020-10-27 01:28:28.428 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [8/11]: topology manager
2020-10-27 01:28:28.432 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [9/11]: monitoring services
2020-10-27 01:28:28.442 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [10/11]: leader management request handler
2020-10-27 01:28:28.448 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 [11/11]: zeebe partitions
2020-10-27 01:28:28.453 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 partitions [1/1]: partition 1
2020-10-27 01:28:30.836 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-0] INFO  org.camunda.feel.FeelEngine - Engine created. [value-mapper: CompositeValueMapper(List(io.zeebe.el.impl.feel.MessagePackValueMapper@732f28bf)), function-provider: io.zeebe.el.impl.feel.FeelFunctionProvider@1fcba24f, configuration: Configuration(false)]
2020-10-27 01:28:31.152 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-0] INFO  io.zeebe.logstreams - Recovered state of partition 1 from snapshot at position -1
2020-10-27 01:28:31.195 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-0] INFO  org.camunda.feel.FeelEngine - Engine created. [value-mapper: CompositeValueMapper(List(io.zeebe.el.impl.feel.MessagePackValueMapper@3cda798c)), function-provider: io.zeebe.el.impl.feel.FeelFunctionProvider@28b889ef, configuration: Configuration(false)]
2020-10-27 01:28:31.228 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-0] INFO  org.camunda.feel.FeelEngine - Engine created. [value-mapper: CompositeValueMapper(List(io.zeebe.el.impl.feel.MessagePackValueMapper@cdd70a)), function-provider: io.zeebe.el.impl.feel.FeelFunctionProvider@19d449ed, configuration: Configuration(false)]
2020-10-27 01:28:31.582 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 partitions succeeded. Started 1 steps in 3128 ms.
2020-10-27 01:28:31.583 [] [main] INFO  io.zeebe.broker.system - Bootstrap Broker-0 succeeded. Started 11 steps in 9593 ms.

And then the docker container stopped.

BigeYoung commented 4 years ago

I found the problem. My server was invaded by a mining virus, which caused an extremely high CPU usage. Sorry for wasting your time.

salaboy commented 4 years ago

@BigeYoung oh wow.. so it was running out of memory.. that is crazy, I am sorry to hear that.. can you confirm that now it is working for you?

BigeYoung commented 4 years ago

@BigeYoung oh wow.. so it was running out of memory.. that is crazy, I am sorry to hear that.. can you confirm that now it is working for you?

Yes, it works well. Sorry for wasting your time, and thank you very much for your patience!

salaboy commented 4 years ago

@BigeYoung no worries, I am here to help.. feel free to reach out if you find more issues with zeebe or if want to provide feedback about it.