emqx / emqx-operator

A Kubernetes Operator for EMQX
https://www.emqx.com
Apache License 2.0
203 stars 64 forks source link

Blue-green deployment and scale-up not working if ACL authorization file comes from an extra volume mount #1028

Closed ddellarocca closed 2 months ago

ddellarocca commented 3 months ago

Describe the bug If the ACL authorization config in EMQX crd has been configured to use an extra volume mount, during scale-up or blue-green upgrade, the new nodes are unable to join the cluster.

To Reproduce Preconditions: emqx operator up and running in a k8s cluster.

  1. Apply the following manifest and wait for the cluster to be ready

    apiVersion: apps.emqx.io/v2beta1
    kind: EMQX
    metadata:
    name: emqx
    namespace: emqx-operated
    spec:
    image: emqx:5.5.1
    config:
    data: |-
      log {
        file_handlers {
          enable = false
        }
    
        console_handler {
          enable = true
          level = debug
          formatter = json
        }
      }
    
      cluster {
        autoclean = "5m"
      }
    
      authorization {
        cache {
          enable = true
          ttl = "5m"
        }
        deny_action = "ignore"
        no_match = "allow"
        sources = [
          {
            type = "file"
            enable = true
    
            path = "/opt/emqx/data/authz/acl/acl.conf"
          }
        ]
      }
    
    coreTemplate:
    spec:
      replicas: 1
      resources:
        limits:
          cpu: 1
          memory: 4Gi
        requests:
          cpu: 1
          memory: 4Gi
      ports:
        - containerPort: 8883
          name: mqttssl
          protocol: TCP
        - containerPort: 1883
          name: mqtt
          protocol: TCP
      extraVolumeMounts:
        - name: authz-acl-file
          mountPath: /opt/emqx/data/authz/acl
      extraVolumes:
        - name: authz-acl-file
          configMap:
            name: authz-acl-file
    listenersServiceTemplate:
    spec:
      type: LoadBalancer
    dashboardServiceTemplate:
    spec:
      type: LoadBalancer
    updateStrategy:
    initialDelaySeconds: 10
    type: Recreate
  2. Increase the replicas from 1 to 2
  3. The new node logs the following error and goes in crashloop logs.txt

Expected behavior The node should join the cluster with the correct ACL authorization configurations.

Anything else we need to know? If the EMQX is deleted and then applied again it successfully starts with the ACL configured with the desired number of nodes.

Environment details::

Rory-Z commented 3 months ago

@yanzhiemq please check this

yanzhiemq commented 3 months ago

@ddellarocca When a node joins the cluster, it will synchronize the ACL file from the existing nodes. However, if the ACL file is mounted as read-only, the synchronization operation will fail, preventing the node from starting.

ddellarocca commented 3 months ago

@yanzhiemq yea but the configmap can be mounted only in read-only, is there a way to tell EMQX to not do that? Like to mount the configmap in the default path and then let EMQX sync in another path?

yanzhiemq commented 3 months ago

@Rory-Z Is there a workaround way to configure ACL file in EMQX operator?

ddellarocca commented 3 months ago

I was thinking of creating an InitContainer to copy the ACL file in a directory that can be written by EMQX but I don't like this approach

Rory-Z commented 3 months ago

@Rory-Z Is there a workaround way to configure ACL file in EMQX operator?

No, for configMap in Kubernetes, the application can not write it.

Rory-Z commented 3 months ago

I was thinking of creating an InitContainer to copy the ACL file in a directory that can be written by EMQX but I don't like

Sorry for delay, could you please try to put configMap of acl.conf to /opt/emqx/etc/acl.conf of EMQX container, I think EMQX will not to do write in etc path, it will read from etc path, and write to data path.

ddellarocca commented 3 months ago

Putting the configMap under /etc seems to have solved the problem, thanks.