ZEEBE_ADVERTISED_HOST doesn't work

IcyEagle commented 4 years ago

Hi there!

I'm deploying a Zeebe cluster in k8s with 3 nodes. We use Istio (have Envoy installed within a pod), that's why the advertised host feature is required to make it work.

Here is my zeebe.cfg.toml:

   # For more information about this configuration visit:
    [threads]
    cpuThreadCount = 1

    [network]
    host = "0.0.0.0"

    [gateway.monitoring]
    enabled = true

    [...exporters configuration omitted...]

My startup.sh script

    #!/bin/bash -xeu

    configFile=/usr/local/zeebe/conf/zeebe.cfg.toml
    export ZEEBE_NODE_ID="${HOSTNAME##*-}"
    export ZEEBE_ADVERTISED_HOST=$(hostname -f)

    # We need to specify all brokers as contact points for partition healing to work correctly
    # https://github.com/zeebe-io/zeebe/issues/2684
    ZEEBE_CONTACT_POINTS=${HOSTNAME::-1}0.$(hostname -d):26502
    for (( i=1; i<$ZEEBE_CLUSTER_SIZE; i++ ))
    do
        ZEEBE_CONTACT_POINTS="${ZEEBE_CONTACT_POINTS},${HOSTNAME::-1}$i.$(hostname -d):26502"
    done
    export ZEEBE_CONTACT_POINTS="${ZEEBE_CONTACT_POINTS}"

    exec /usr/local/zeebe/bin/broker

Here is what I have:

➜  ~ k exec -it zeebe-cluster-0 -c zeebe-cluster ./bin/zbctl status -- --insecure
Cluster size: 3
Partitions count: 3
Replication factor: 3
Brokers:
  Broker 0 - 0.0.0.0:26501
  Broker 1 - 0.0.0.0:26501
  Broker 2 - 0.0.0.0:26501

My statefulset descriptor:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: {{ include "zeebe-cluster.fullname" . }}
  namespace: {{ include "zeebe-cluster.namespace" . }}
  labels:
{{ include "zeebe-cluster.labels" . | indent 4 }}
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      app.kubernetes.io/name: {{ include "zeebe-cluster.name" . }}
      app.kubernetes.io/instance: {{ .Release.Name }}
  serviceName: {{ include "zeebe-cluster.fullname" . }}
  updateStrategy:
    type: RollingUpdate
  podManagementPolicy: Parallel
  template:
    metadata:
      annotations:
        prometheus.io/port: "9600"
        prometheus.io/scrape: "true"
        prometheus.io/path: "/metrics"
      labels:
        app.kubernetes.io/name: {{ include "zeebe-cluster.name" . }}
        app.kubernetes.io/instance: {{ .Release.Name }}
        json_logs: "true"
    spec:
      containers:
        - name: {{ .Chart.Name }}
          image: "{{ .Values.image.url }}:{{ .Values.version }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          env:
            - name: ZEEBE_LOG_LEVEL
              value: info
            - name: ZEEBE_PARTITIONS_COUNT
              value: "3"
            - name: ZEEBE_CLUSTER_SIZE
              value: "3"
            - name: ZEEBE_REPLICATION_FACTOR
              value: "3"
            - name: JAVA_TOOL_OPTIONS
              value:
            {{- toYaml .Values.JavaOpts | nindent 16}}
          ports:
            - containerPort: {{ .Values.service.http.port }}
              name: http
            - containerPort: {{ .Values.service.gateway.port }}
              name: gateway
            - containerPort: {{ .Values.service.command.port }}
              name: command
            - containerPort: {{ .Values.service.internal.port }}
              name: internal
          readinessProbe:
            httpGet:
              path: {{ .Values.service.http.healthcheckPath }}
              port: http
            initialDelaySeconds: 60
            periodSeconds: 50
          livenessProbe:
            httpGet:
              path: {{ .Values.service.http.healthcheckPath }}
              port: http
            initialDelaySeconds: 120
            periodSeconds: 30
            failureThreshold: 10
          resources:
            {{- toYaml .Values.resources | nindent 12 }}
          volumeMounts:
            - name: config
              mountPath: /usr/local/zeebe/conf/zeebe.cfg.toml
              subPath: zeebe.cfg.toml
            - name: config
              mountPath: /usr/local/zeebe/conf/log4j2.xml
              subPath: log4j2.xml
            - name: config
              mountPath: /usr/local/bin/startup.sh
              subPath: startup.sh
            - name: data
              mountPath: /usr/local/zeebe/data
      volumes:
        - name: config
          configMap:
            name: {{ include "zeebe-cluster.fullname" . }}
            defaultMode: 0744
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 5Gi

I use your image comunda/zeebe:0.21.1

It worth noting it works "better" when I bind 26501 port to hostname -f (default) instead of 0.0.0.0, but this way cluster can be assembled only when I flush all persistent volumes. Maybe, it can help.

npepinpe commented 4 years ago

Hey @IcyEagle, thanks for reporting this. I've been using Zeebe with linkerd so I'm surprised it doesn't work with Istio; admittedly I haven't tried it. But in your case it seems like the brokers are anyway not advertising the right address, as evidenced by the topology.

Can you launch your brokers with log level INFO? You can just add the env var ZEEBE_LOG_LEVEL="info". It should print the parsed configuration at start up and we can see what Zeebe thinks it should be using.

It should be something like:

2019-10-30 15:06:58.467 [] [main] INFO  io.zeebe.broker.system - Starting broker with configuration {
  "network": {
    "host": "0.0.0.0",
    "portOffset": 0,
    "maxMessageSize": "4M",
    "advertisedHost": "bench-broker-0.bench-broker.npepinpe.svc.cluster.local",
    "commandApi": {
      "defaultPort": 26501,
      "host": "0.0.0.0",
      "port": 26501,
      "advertisedHost": "bench-broker-0.bench-broker.npepinpe.svc.cluster.local",
      "advertisedPort": 26501
    },
    "internalApi": {
      "defaultPort": 26502,
      "host": "0.0.0.0",
      "port": 26502,
      "advertisedHost": "bench-broker-0.bench-broker.npepinpe.svc.cluster.local",
      "advertisedPort": 26502
    },
    "monitoringApi": {
      "defaultPort": 9600,
      "host": "0.0.0.0",
      "port": 9600,
      "advertisedHost": "bench-broker-0.bench-broker.npepinpe.svc.cluster.local",
      "advertisedPort": 9600
    }
  },
  "cluster": {
    "initialContactPoints": [
      "bench-broker-0.bench-broker.npepinpe.svc.cluster.local:26502",
      "bench-broker-1.bench-broker.npepinpe.svc.cluster.local:26502",
      "bench-broker-2.bench-broker.npepinpe.svc.cluster.local:26502",
      "bench-broker-3.bench-broker.npepinpe.svc.cluster.local:26502",
      "bench-broker-4.bench-broker.npepinpe.svc.cluster.local:26502"
    ],
    "partitionIds": [
      1,
      2,
      3,
      4,
      5,
      6,
      7,
      8
    ],
    "nodeId": 0,
    "partitionsCount": 8,
    "replicationFactor": 3,
    "clusterSize": 5,
    "clusterName": "zeebe-cluster"
  },
  "threads": {
    "cpuThreadCount": 8,
    "ioThreadCount": 8
  },
  "data": {
    "directories": [
      "/usr/local/zeebe/data"
    ],
    "logSegmentSize": "512M",
    "snapshotPeriod": "3m",
    "raftSegmentSize": "512M",
    "maxSnapshots": 3
  },
...

And the topology would be something like:

Cluster size: 5
Partitions count: 8
Replication factor: 3
Brokers:
  Broker 0 - bench-broker-0.bench-broker.npepinpe.svc.cluster.local:26501
    Partition 1 : Follower
    Partition 4 : Follower
    Partition 5 : Leader
    Partition 6 : Follower
  Broker 1 - bench-broker-1.bench-broker.npepinpe.svc.cluster.local:26501
    Partition 1 : Follower
    Partition 2 : Follower
    Partition 5 : Follower
    Partition 6 : Leader
    Partition 7 : Follower
  Broker 2 - bench-broker-2.bench-broker.npepinpe.svc.cluster.local:26501
    Partition 1 : Leader
    Partition 2 : Leader
    Partition 3 : Leader
    Partition 6 : Follower
    Partition 7 : Leader
    Partition 8 : Follower
  Broker 3 - bench-broker-3.bench-broker.npepinpe.svc.cluster.local:26501
    Partition 2 : Follower
    Partition 3 : Follower
    Partition 4 : Leader
    Partition 7 : Follower
    Partition 8 : Leader
  Broker 4 - bench-broker-4.bench-broker.npepinpe.svc.cluster.local:26501
    Partition 3 : Follower
    Partition 4 : Follower
    Partition 5 : Follower
    Partition 8 : Follower

npepinpe commented 4 years ago

I can also give you the configuration I use, maybe that can help.

Init script:

#!/usr/bin/env bash
set -eux -o pipefail

export ZEEBE_ADVERTISED_HOST=$(hostname -f)
export ZEEBE_NODE_ID=${ZEEBE_NODE_NAME##*-}

# As the number of replicas or the DNS is not obtainable from the downward API yet,
# defined them here based on conventions
replicaHost=${ZEEBE_NODE_NAME%-*}
export ZEEBE_CONTACT_POINTS=$(for ((i=0; i<${ZEEBE_CLUSTER_SIZE}; i++)); do echo -n "${replicaHost}-$i.$(hostname -d):26502,"; done)

exec /usr/local/bin/startup.sh

Stateful Set:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: broker
spec:
  serviceName: broker
  replicas: 1
  podManagementPolicy: Parallel
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
       accessModes:
         - ReadWriteOnce
       storageClassName: ssd
       resources:
         requests:
           storage: 50Gi
  template:
    spec:
      volumes:
        - name: configuration
          configMap:
            name: broker-config
            defaultMode: 0744
        - name: scripts
          configMap:
            name: broker-scripts
            defaultMode: 0755
      containers:
        - name: broker
          image: camunda/zeebe:latest
          ports:
            - containerPort: 26500
              name: gateway
            - containerPort: 26501
              name: command
            - containerPort: 26502
              name: cluster
            - containerPort: 9600
              name: monitoring
          env:
            - name: ZEEBE_CLUSTER_SIZE
              value: "1"
            - name: ZEEBE_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          volumeMounts:
            - name: data
              mountPath: /usr/local/zeebe/data
            - name: configuration
              mountPath: /usr/local/zeebe/conf/zeebe.cfg.toml
              subPath: zeebe.cfg.toml
            - name: configuration
              mountPath: /usr/local/zeebe/conf/log4j2.xml
              subPath: log4j2.xml
            - name: scripts
              mountPath: /usr/local/bin/init.sh
              subPath: init.sh
          command: ["/bin/sh"]
          args: ["-c", "exec tini -- /usr/local/bin/init.sh"]
          securityContext:
            capabilities:
                add: ["NET_ADMIN", "SYS_ADMIN"]
          readinessProbe:
            httpGet:
              path: /ready
              port: monitoring
            initialDelaySeconds: 20
            periodSeconds: 5

This is a partial config since I'm using kustomize to fill out some blanks, but this works with linkerd injected, so I would assume it would also work with istio.

IcyEagle commented 4 years ago

@npepinpe thanks for your fast response :)

Here is the runtime configuration:

2019-10-30 15:14:25.406 [] [main] INFO  io.zeebe.broker.system - Version: 0.21.1
2019-10-30 15:14:25.415 [] [main] INFO  io.zeebe.broker.system - Starting broker with configuration {
  "network": {
    "host": "0.0.0.0",
    "portOffset": 0,
    "maxMessageSize": "4M",
    "commandApi": {
      "host": "0.0.0.0",
      "port": 26501
    },
    "internalApi": {
      "host": "0.0.0.0",
      "port": 26502
    },
    "monitoringApi": {
      "host": "0.0.0.0",
      "port": 9600
    }
  },
  "cluster": {
    "initialContactPoints": [
      "zeebe-cluster-0.zeebe-cluster.dev.svc.cluster.local:26502",
      "zeebe-cluster-1.zeebe-cluster.dev.svc.cluster.local:26502",
      "zeebe-cluster-2.zeebe-cluster.dev.svc.cluster.local:26502"
    ],
    "partitionIds": [
      1,
      2,
      3
    ],
    "nodeId": 2,
    "partitionsCount": 3,
    "replicationFactor": 3,
    "clusterSize": 3,
    "clusterName": "zeebe-cluster"
  },
  "threads": {
    "cpuThreadCount": 1,
    "ioThreadCount": 2
  },
  "data": {
    "directories": [
      "/usr/local/zeebe/data"
    ],
    "logSegmentSize": "512M",
    "snapshotPeriod": "15m",
    "raftSegmentSize": "512M",
    "maxSnapshots": 3
  },
"exporters": [
    {
      "id": "hazelcast",
      "className": "org.project.HazelcastExporter",
      "args": {
        "host": "hazelcast.dev.svc.cluster.local",
        "enabledValueTypes": "JOB,WORKFLOW_INSTANCE,DEPLOYMENT,INCIDENT,TIMER,VARIABLE,MESSAGE,MESSAGE_SUBSCRIPTION,MESSAGE_START_EVENT_SUBSCRIPTION"
      }
    },
    {
      "id": "log",
      "className": "org.project.LogExporter",
      "args": {
        "enabledValueTypes": "JOB,WORKFLOW_INSTANCE,DEPLOYMENT,INCIDENT,TIMER,VARIABLE,MESSAGE,MESSAGE_SUBSCRIPTION,MESSAGE_START_EVENT_SUBSCRIPTION"
      }
    }
  ],
  "gateway": {
    "enable": true,
    "network": {
      "host": "0.0.0.0",
      "port": 26500
    },
    "cluster": {
      "contactPoint": "0.0.0.0:26502",
      "maxMessageSize": "4M",
      "requestTimeout": "15s",
      "clusterName": "zeebe-cluster",
      "memberId": "gateway",
      "host": "0.0.0.0",
      "port": 26502
    },
    "threads": {
      "managementThreads": 1
    },
    "monitoring": {
      "enabled": true,
      "host": "0.0.0.0",
      "port": 9600
    },
    "security": {
      "enabled": false
    }
  },
  "backpressure": {
    "enabled": true,
    "useWindowed": true,
    "algorithm": "vegas"
  }
}

It seems okay, indeed. Let me compare your config with mine. Maybe, I'll find valuable differences...

npepinpe commented 4 years ago

Seems the version you're using doesn't include advertised host, as it would be printed in the configuration. I thought it was included in 0.21.1, but I guess not? Can you try with SNAPSHOT?

IcyEagle commented 4 years ago

Well, it looks definitely better now:

2019-10-30 15:45:13.820 [] [main] INFO  io.zeebe.broker.system - Starting broker with configuration {
  "network": {
    "host": "0.0.0.0",
    "portOffset": 0,
    "maxMessageSize": "4M",
    "advertisedHost": "zeebe-cluster-2.zeebe-cluster.dev.svc.cluster.local",
    "commandApi": {
      "defaultPort": 26501,
      "host": "0.0.0.0",
      "port": 26501,
      "advertisedHost": "zeebe-cluster-2.zeebe-cluster.dev.svc.cluster.local",
      "advertisedPort": 26501
    },
    "internalApi": {
      "defaultPort": 26502,
      "host": "0.0.0.0",
      "port": 26502,
      "advertisedHost": "zeebe-cluster-2.zeebe-cluster.dev.svc.cluster.local",
      "advertisedPort": 26502
    },
    "monitoringApi": {
      "defaultPort": 9600,
      "host": "0.0.0.0",
      "port": 9600,
      "advertisedHost": "zeebe-cluster-2.zeebe-cluster.dev.svc.cluster.local",
      "advertisedPort": 9600
    }
  },
  "cluster": {
    "initialContactPoints": [
      "zeebe-cluster-0.zeebe-cluster.dev.svc.cluster.local:26502",
      "zeebe-cluster-1.zeebe-cluster.dev.svc.cluster.local:26502",
      "zeebe-cluster-2.zeebe-cluster.dev.svc.cluster.local:26502"
    ],
    "partitionIds": [
      1,
      2,
      3
    ],
    "nodeId": 2,
    "partitionsCount": 3,
    "replicationFactor": 3,
    "clusterSize": 3,
    "clusterName": "zeebe-cluster"
  },
  "threads": {
    "cpuThreadCount": 1,
    "ioThreadCount": 2
  },
  "data": {
    "directories": [
      "/usr/local/zeebe/data"
    ],
    "logSegmentSize": "512M",
    "snapshotPeriod": "15m",
    "raftSegmentSize": "512M",
    "maxSnapshots": 3
  },
  "exporters": [
    {
      "id": "hazelcast",
      "className": "org.project.HazelcastExporter",
      "args": {
        "host": "hazelcast.dev.svc.cluster.local",
        "enabledValueTypes": "JOB,WORKFLOW_INSTANCE,DEPLOYMENT,INCIDENT,TIMER,VARIABLE,MESSAGE,MESSAGE_SUBSCRIPTION,MESSAGE_START_EVENT_SUBSCRIPTION"
      }
    },
    {
      "id": "log",
      "className": "org.project.LogExporter",
      "args": {
        "enabledValueTypes": "JOB,WORKFLOW_INSTANCE,DEPLOYMENT,INCIDENT,TIMER,VARIABLE,MESSAGE,MESSAGE_SUBSCRIPTION,MESSAGE_START_EVENT_SUBSCRIPTION"
      }
    }
  ],
  "gateway": {
    "enable": true,
    "network": {
      "host": "0.0.0.0",
      "port": 26500
    },
    "cluster": {
      "contactPoint": "0.0.0.0:26502",
      "maxMessageSize": "4M",
      "requestTimeout": "15s",
      "clusterName": "zeebe-cluster",
      "memberId": "gateway",
      "host": "0.0.0.0",
      "port": 26502
    },
    "threads": {
      "managementThreads": 1
    },
    "monitoring": {
      "enabled": true,
      "host": "0.0.0.0",
      "port": 9600
    },
    "security": {
      "enabled": false
    }
  },
  "backpressure": {
    "enabled": true,
    "useWindowed": true,
    "algorithm": "vegas"
  }
}

Gateway config:

2019-10-30 15:45:19.715 [] [zb-blocking-task-runner-1-0.0.0.0:26501] INFO  io.zeebe.gateway - Starting gateway with configuration {
  "enable": true,
  "network": {
    "host": "0.0.0.0",
    "port": 26500
  },
  "cluster": {
    "contactPoint": "0.0.0.0:26502",
    "maxMessageSize": "4M",
    "requestTimeout": "15s",
    "clusterName": "zeebe-cluster",
    "memberId": "gateway",
    "host": "0.0.0.0",
    "port": 26502
  },
  "threads": {
    "managementThreads": 1
  },
  "monitoring": {
    "enabled": true,
    "host": "0.0.0.0",
    "port": 9600
  },
  "security": {
    "enabled": false
  }
}

and the rest logs

2019-10-30 15:45:20.089 [GatewayTopologyManager] [0.0.0.0:26501-zb-actors-0] INFO  io.zeebe.transport.endpoint - Registering endpoint for node '2' with address 'zeebe-cluster-2.zeebe-cluster.core-dev.svc.cluster.local:26501' on transport 'gateway-broker-client'
2019-10-30 15:45:24.486 [] [raft-server-system-partition-1] WARN  io.atomix.protocols.raft.roles.FollowerRole - RaftServer{system-partition-1}{role=FOLLOWER} - java.net.ConnectException
2019-10-30 15:45:24.487 [] [raft-server-system-partition-1] WARN  io.atomix.protocols.raft.roles.FollowerRole - RaftServer{system-partition-1}{role=FOLLOWER} - java.net.ConnectException
2019-10-30 15:45:28.244 [] [raft-server-system-partition-1] WARN  io.atomix.protocols.raft.roles.FollowerRole - RaftServer{system-partition-1}{role=FOLLOWER} - java.net.ConnectException
2019-10-30 15:45:28.244 [] [raft-server-system-partition-1] WARN  io.atomix.protocols.raft.roles.FollowerRole - RaftServer{system-partition-1}{role=FOLLOWER} - java.net.ConnectException

Unfortunately, the nodes don't see each other. Any ideas?

npepinpe commented 4 years ago

If you open a shell in one of the containers, can you ping the other containers? Next up would be to test if the port is opened remotely, just to discard that it's not a problem with Istio itself but really a Zeebe issue.

I usually just use telnet to quickly see if a port is opened at L4, but maybe you know a better way.

IcyEagle commented 4 years ago

I forgot to add publishNotReadyAddresses: true to my service to make k8s publish hosts.

I still have some problems with cluster functioning, but it's out of the scope of the issue.

Conclusion: it seems there is a problem with image comunda/zeebe:0.21.1.

Thanks @npepinpe :)

camunda / camunda

ZEEBE_ADVERTISED_HOST doesn't work #3283