emqx / emqx-operator

A Kubernetes Operator for EMQX
https://www.emqx.com
Apache License 2.0
203 stars 64 forks source link

New core nodes never become ready when replicant replicas set to 0 #1002

Closed rouke-broersma closed 5 months ago

rouke-broersma commented 5 months ago

Describe the bug When I have replicants replicas set to 0 and there is a change in the core template, the new core statefulset is created but never becomes ready.

To Reproduce Create EMQX cluster. Scale replicant nodes to 0. Then make a change to core template for example add volume claim template. See that a new STS is created, but the readiness gate does not get fulfilled. Now delete the replicaset for the replicant nodes, the new STS now becomes ready.

https://github.com/broersma-forslund/homelab/blob/6c39851ef8c32f2ddc1aafa76069fe75f0060fec/apps/emqx/templates/cluster.yaml#L10

Expected behavior Core nodes can be updates with replicant nodes set to 0 replicas.

Anything else we need to know?:

EMQX resource status contains:

  replicantNodes:
    - controllerUID: 35e43808-a546-497f-8808-c64c170e4b05
      edition: Opensource
      node: emqx@10.244.2.69
      node_status: running
      otp_release: 25.3.2-2/13.2.2
      podUID: ccf3ed92-6e07-4ef0-bee3-474dbdf5599a
      role: replicant
      uptime: 211000
      version: 5.4.1
    - controllerUID: 35e43808-a546-497f-8808-c64c170e4b05
      edition: Opensource
      node: emqx@10.244.0.17
      node_status: running
      otp_release: 25.3.2-2/13.2.2
      podUID: d404d92a-640d-49f7-876f-dd0e2ec5a0a1
      role: replicant
      uptime: 211809
      version: 5.4.1
  replicantNodesStatus:
    currentReplicas: 2
    currentRevision: 6cdf77ff7
    readyReplicas: 2
    updateReplicas: 2
    updateRevision: 6cdf77ff7

Even though replicant replicas is 0 and no replicant nodes exist.

EMQX replicant replicaset:

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  annotations:
    apps.emqx.io/last-applied: >-
      UEsDBBQACAAIAAAAAAAAAAAAAAAAAAAAAAAIAAAAb3JpZ2luYWzUVktv2zgQ/i9zlvxIm7rVzbHdxEhieWW12GAbCDQ1solQJEtSbo1A/31ByQ/5lexhD7sny5z55vlxhq9AFPuO2jApIACilGmvuuDBCxMpBBCh4oySGVrwIEdLUmIJBK/AyRy5cV8O08L85+8Wk+107mvJEQLQNVI44IEKE8YSQZ2OOzoW50SQBab+fL1R8KVCTazUx5pKpr7FXHFi0V8Ss4QAPtE06/WyrAelB4LkWy/+Lhx/r1JrGEWawchfAnWEGWoUFA0Ef52p0S6G1dUcLXEFm3NJX0IHHiJHW+laXaAHVAqrJeeotyeb4o4e//gTDsIEDwrmRN3P2PvyuZf6Xcyo/7F71fUJ7WT+ddoj11fZHHsfv0D5XHpgFFLXhk2CBoKOBwY5Uley4BVyYuny4T/frtKDrbSK+n/FtUYfXLcJE6hr6qBYVb+bJruWJ8P+7O4m7EfDJHkYz+LRZBTNkuQujqdJcjOeDMGDFeEFVjzofP4ApXdoYPDwbRaPoiQZjmeD8PsoekpmcdSPR7dPDWwqzBvIySxJotEgjIZJ/DQdNXBGr97BTfqPTUBVuSWSlKPZlMysaIvywljULS4p4ScW78JZvLXxVcvc1S5jyNMIs933lFhXaWOJLUxLyXQ8hbI8NjUJh6MkGfbjfjIcR80KOAKd147Ch2YKewadVx+E4f14dByvQarR3uN6E/MLOiIJmWJCpXxheHi7fSfxN5LTNPrTcXI/ekqSmzCMXT+nydfxQZQ/oC2VbTtbbZdbey6lNVYTlRDFkhdc/wAonz1gOVlsvVbqwXXrY8uNKc5WKNCYqZbz6qJlhPFCY7zUaJaSpxB88GBprbpF6+SqbkG77gF4oKS2VW3Nci6JTt2kZYJZRvgQOVnPkEqRGgg+dTxQqJlMd0cfOuXxvHPm6quyuznTykNF/Z323p0HSksrqeQQQDyYVglrJCl7J6/u1b+UWPc0sevSxWBkoaud8eoGAtJCM7seSGHxd+VSF6JvbrUslDPS6Xj1yUSKSEq73Q7V2Tfj1oVTKj1YSV7k+CgLsSlV7j43l+M9Suw5uBP5RDG/FrnChYKvt85NMd/YPTXkCHveM1pafbSoFNk5h+6cLd70t8df9sPlAi7tdSe7jKxGwSXobk4cYKs/mLYpansEtdy47dsk3i2xuONxytwDIF4rPH4wSOEb1CsmFhVvz5AkO2TI5u9gScQCp5Iz6mZMn/8ia0faC5Ta08eVWCmOOQpLeKXpwnSi5x21THNHnaNJPen2M2/SKMapfjXa6pY/EuVQzeLVAtN4oZ3wxOExV3Y9ZLq+TW/3/J/oHi+DXSMvZpf/tNZp+I4BLGPUvU3K8rms9n09Ng5fXmX5dwAAAP//UEsHCI2HplsGBAAAUQsAAFBLAQIUABQACAAIAAAAAACNh6ZbBgQAAFELAAAIAAAAAAAAAAAAAAAAAAAAAABvcmlnaW5hbFBLBQYAAAAAAQABADYAAAA8BAAAAAA=
  creationTimestamp: '2024-01-09T19:13:36Z'
  generation: 1
  labels:
    apps.emqx.io/db-role: replicant
    apps.emqx.io/instance: emqx
    apps.emqx.io/managed-by: emqx-operator
    apps.emqx.io/pod-template-hash: 6cdf77ff7
  name: emqx-replicant-6cdf77ff7
  namespace: emqx
  ownerReferences:
    - apiVersion: apps.emqx.io/v2beta1
      blockOwnerDeletion: true
      controller: true
      kind: EMQX
      name: emqx
      uid: 18e7987d-1efc-4121-ac0f-5d7a52fbe749
  resourceVersion: '62140479'
  uid: f31e31de-16c7-4d12-b82c-8c1637afd962
spec:
  replicas: 0
  selector:
    matchLabels:
      apps.emqx.io/db-role: replicant
      apps.emqx.io/instance: emqx
      apps.emqx.io/managed-by: emqx-operator
      apps.emqx.io/pod-template-hash: 6cdf77ff7
  template:
    metadata:
      creationTimestamp: null
      labels:
        apps.emqx.io/db-role: replicant
        apps.emqx.io/instance: emqx
        apps.emqx.io/managed-by: emqx-operator
        apps.emqx.io/pod-template-hash: 6cdf77ff7
    spec:
      containers:
        - env:
            - name: EMQX_DASHBOARD__LISTENERS__HTTP__BIND
              value: '18083'
            - name: EMQX_CLUSTER__DISCOVERY_STRATEGY
              value: dns
            - name: EMQX_CLUSTER__DNS__RECORD_TYPE
              value: srv
            - name: EMQX_CLUSTER__DNS__NAME
              value: emqx-headless.emqx.svc.cluster.local
            - name: EMQX_HOST
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.podIP
            - name: EMQX_NODE__DATA_DIR
              value: data
            - name: EMQX_NODE__ROLE
              value: replicant
            - name: EMQX_NODE__COOKIE
              valueFrom:
                secretKeyRef:
                  key: node_cookie
                  name: emqx-node-cookie
            - name: EMQX_API_KEY__BOOTSTRAP_FILE
              value: '"/opt/emqx/data/bootstrap_api_key"'
          image: 'emqx/emqx:5.4.1'
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /status
              port: dashboard
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 30
            successThreshold: 1
            timeoutSeconds: 1
          name: emqx
          ports:
            - containerPort: 18083
              name: dashboard
              protocol: TCP
          readinessProbe:
            failureThreshold: 12
            httpGet:
              path: /status
              port: dashboard
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 5
            successThreshold: 1
            timeoutSeconds: 1
          resources: {}
          securityContext:
            runAsGroup: 1000
            runAsNonRoot: true
            runAsUser: 1000
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - mountPath: /opt/emqx/data/bootstrap_api_key
              name: bootstrap-api-key
              readOnly: true
              subPath: bootstrap_api_key
            - mountPath: /opt/emqx/etc/emqx.conf
              name: bootstrap-config
              readOnly: true
              subPath: emqx.conf
            - mountPath: /opt/emqx/log
              name: emqx-replicant-log
            - mountPath: /opt/emqx/data
              name: emqx-replicant-data
            - mountPath: /mounted/cert
              name: emqx-tls
      dnsPolicy: ClusterFirst
      readinessGates:
        - conditionType: apps.emqx.io/on-serving
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 1000
        fsGroupChangePolicy: Always
        runAsGroup: 1000
        runAsUser: 1000
        supplementalGroups:
          - 1000
      terminationGracePeriodSeconds: 30
      volumes:
        - name: bootstrap-api-key
          secret:
            defaultMode: 420
            secretName: emqx-bootstrap-api-key
        - configMap:
            defaultMode: 420
            name: emqx-configs
          name: bootstrap-config
        - emptyDir: {}
          name: emqx-replicant-log
        - emptyDir: {}
          name: emqx-replicant-data
        - name: emqx-tls
          secret:
            defaultMode: 420
            secretName: mqtt-tls-certificate
status:
  observedGeneration: 1
  replicas: 0