emqx / emqx-operator

A Kubernetes Operator for EMQX
https://www.emqx.com
Apache License 2.0
203 stars 64 forks source link

EMQX Config patches takes precendence on statefulset changes #1027

Closed ddellarocca closed 2 months ago

ddellarocca commented 3 months ago

Describe the bug If the EMQX crd is updated with changes that affect both the statefulset and the EMQX config, the statefulset is updated last and this blocks the update process if the statefulset changes are referenced in EMQX config. See the following example.

To Reproduce Preconditions: emqx operator up and running in a k8s cluster.

  1. Apply the following manifest and wait for the cluster to be ready

    
    apiVersion: apps.emqx.io/v2beta1
    kind: EMQX
    metadata:
    name: emqx
    namespace: emqx-operated
    spec:
    image: emqx:5.5.1
    config:
    data: |-
      log {
        file_handlers {
          enable = false
        }
    
        console_handler {
          enable = true
          level = debug
          formatter = json
        }
      }
    
      cluster {
        autoclean = "5m"
      }
    
    coreTemplate:
    spec:
      replicas: 2
      resources:
        limits:
          cpu: 1
          memory: 4Gi
        requests:
          cpu: 1
          memory: 4Gi
      ports:
        - containerPort: 8883
          name: mqttssl
          protocol: TCP
        - containerPort: 1883
          name: mqtt
          protocol: TCP
    listenersServiceTemplate:
    spec:
      type: LoadBalancer
    dashboardServiceTemplate:
    spec:
      type: LoadBalancer
    updateStrategy:
    initialDelaySeconds: 10
    type: Recreate
2. Add an extra mount and change the config in order to use it (in this case adding an ACL authorization)
```yaml
apiVersion: apps.emqx.io/v2beta1
kind: EMQX
metadata:
  name: emqx
  namespace: emqx-operated
spec:
  image: emqx:5.5.1
  config:
    data: |-
      log {
        file_handlers {
          enable = false
        }

        console_handler {
          enable = true
          level = debug
          formatter = json
        }
      }

      cluster {
        autoclean = "5m"
      }

      authorization {
        cache {
          enable = true
          ttl = "5m"
        }
        deny_action = "ignore"
        no_match = "allow"
        sources = [
          {
            type = "file"
            enable = true

            path = "/opt/emqx/data/authz/acl/acl.conf"
          }
        ]
      }

  coreTemplate:
    spec:
      replicas: 2
      resources:
        limits:
          cpu: 1
          memory: 4Gi
        requests:
          cpu: 1
          memory: 4Gi
      ports:
        - containerPort: 8883
          name: mqttssl
          protocol: TCP
        - containerPort: 1883
          name: mqtt
          protocol: TCP
      extraVolumeMounts:
        - name: authz-acl-file
          mountPath: /opt/emqx/data/authz/acl
      extraVolumes:
        - name: authz-acl-file
          configMap:
            name: authz-acl-file
  listenersServiceTemplate:
    spec:
      type: LoadBalancer
  dashboardServiceTemplate:
    spec:
      type: LoadBalancer
  updateStrategy:
    initialDelaySeconds: 10
    type: Recreate
  1. Deploy again the manifest
  2. EMQX and EMQX Operator report the error of missing file
    {"time":1711446652610981,"level":"alert","msg":"failed_to_read_acl_file","mfa":"emqx_authz_file:validate/1(99)","explain":"No such file or directory","path":"/opt/emqx/data/authz/acl/acl.conf","pid":"<0.4476.0>"}
    {"level":"error","ts":"2024-03-26T09:52:14Z","msg":"Reconciler error","controller":"emqx","controllerGroup":"apps.emqx.io","controllerKind":"EMQX","eMQX":{"name":"emqx","namespace":"emqx-operated"},"namespace":"emqx-operated","name":"emqx","reconcileID":"71da2fe2-2e7e-4fda-b1be-5083c61ea4ba","error":"failed to put emqx config: failed to put API http://10.244.2.15:18083/api/v5/configs?mode=merge, status : 400 Bad Request, body: {\"authorization\":{\"reason\":\"failed_to_read_acl_file\",\"value\":\"/opt/emqx/data/authz/acl/acl.conf\",\"path\":\"authorization.sources.1.path\",\"kind\":\"validation_error\",\"matched_type\":\"authz:file\"}}","errorVerbose":"failed to put API http://10.244.2.15:18083/api/v5/configs?mode=merge, status : 400 Bad Request, body: {\"authorization\":{\"reason\":\"failed_to_read_acl_file\",\"value\":\"/opt/emqx/data/authz/acl/acl.conf\",\"path\":\"authorization.sources.1.path\",\"kind\":\"validation_error\",\"matched_type\":\"authz:file\"}}\ngithub.com/emqx/emqx-operator/controllers/apps/v2beta1.putEMQXConfigsByAPI\n\t/workspace/controllers/apps/v2beta1/sync_emqx_config.go:135\ngithub.com/emqx/emqx-operator/controllers/apps/v2beta1.(*syncConfig).reconcile\n\t/workspace/controllers/apps/v2beta1/sync_emqx_config.go:77\ngithub.com/emqx/emqx-operator/controllers/apps/v2beta1.(*EMQXReconciler).Reconcile\n\t/workspace/controllers/apps/v2beta1/emqx_controller.go:134\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:234\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\nfailed to put emqx config\ngithub.com/emqx/emqx-operator/controllers/apps/v2beta1.(*syncConfig).reconcile\n\t/workspace/controllers/apps/v2beta1/sync_emqx_config.go:78\ngithub.com/emqx/emqx-operator/controllers/apps/v2beta1.(*EMQXReconciler).Reconcile\n\t/workspace/controllers/apps/v2beta1/emqx_controller.go:134\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:234\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:234"}

Expected behavior The operator should update the statefulset first if it needs to be redeployed and then apply the EMQX config, or change the EMQX config configmap and then update the statefulset.

Anything else we need to know?: If the EMQX is deleted and then applied again it successfully starts with the ACL configured, so it is not related to the crd.

Environment details:

Rory-Z commented 3 months ago

Yes, this is a defect, thanks for feedback, let me fix it.