Blue-green deployment and scale-up not working if ACL authorization file comes from an extra volume mount

ddellarocca commented 3 months ago

Describe the bug If the ACL authorization config in EMQX crd has been configured to use an extra volume mount, during scale-up or blue-green upgrade, the new nodes are unable to join the cluster.

To Reproduce Preconditions: emqx operator up and running in a k8s cluster.

Apply the following manifest and wait for the cluster to be ready

apiVersion: apps.emqx.io/v2beta1
kind: EMQX
metadata:
name: emqx
namespace: emqx-operated
spec:
image: emqx:5.5.1
config:
data: |-
  log {
    file_handlers {
      enable = false
    }

    console_handler {
      enable = true
      level = debug
      formatter = json
    }
  }

  cluster {
    autoclean = "5m"
  }

  authorization {
    cache {
      enable = true
      ttl = "5m"
    }
    deny_action = "ignore"
    no_match = "allow"
    sources = [
      {
        type = "file"
        enable = true

        path = "/opt/emqx/data/authz/acl/acl.conf"
      }
    ]
  }

coreTemplate:
spec:
  replicas: 1
  resources:
    limits:
      cpu: 1
      memory: 4Gi
    requests:
      cpu: 1
      memory: 4Gi
  ports:
    - containerPort: 8883
      name: mqttssl
      protocol: TCP
    - containerPort: 1883
      name: mqtt
      protocol: TCP
  extraVolumeMounts:
    - name: authz-acl-file
      mountPath: /opt/emqx/data/authz/acl
  extraVolumes:
    - name: authz-acl-file
      configMap:
        name: authz-acl-file
listenersServiceTemplate:
spec:
  type: LoadBalancer
dashboardServiceTemplate:
spec:
  type: LoadBalancer
updateStrategy:
initialDelaySeconds: 10
type: Recreate

Increase the replicas from 1 to 2
The new node logs the following error and goes in crashloop logs.txt

Expected behavior The node should join the cluster with the correct ACL authorization configurations.

Anything else we need to know? If the EMQX is deleted and then applied again it successfully starts with the ACL configured with the desired number of nodes.

Environment details::

Kubernetes version: 1.21.14
Cloud-provider/provisioner: local kind
emqx-operator version: 2.2.14
Install method: helm

Rory-Z commented 3 months ago

@yanzhiemq please check this

yanzhiemq commented 3 months ago

@ddellarocca When a node joins the cluster, it will synchronize the ACL file from the existing nodes. However, if the ACL file is mounted as read-only, the synchronization operation will fail, preventing the node from starting.

ddellarocca commented 3 months ago

@yanzhiemq yea but the configmap can be mounted only in read-only, is there a way to tell EMQX to not do that? Like to mount the configmap in the default path and then let EMQX sync in another path?

yanzhiemq commented 3 months ago

@Rory-Z Is there a workaround way to configure ACL file in EMQX operator?

ddellarocca commented 3 months ago

I was thinking of creating an InitContainer to copy the ACL file in a directory that can be written by EMQX but I don't like this approach

Rory-Z commented 3 months ago

@Rory-Z Is there a workaround way to configure ACL file in EMQX operator?

No, for configMap in Kubernetes, the application can not write it.

Rory-Z commented 3 months ago

I was thinking of creating an InitContainer to copy the ACL file in a directory that can be written by EMQX but I don't like

Sorry for delay, could you please try to put configMap of acl.conf to /opt/emqx/etc/acl.conf of EMQX container, I think EMQX will not to do write in etc path, it will read from etc path, and write to data path.

ddellarocca commented 3 months ago

Putting the configMap under /etc seems to have solved the problem, thanks.

emqx / emqx-operator

Blue-green deployment and scale-up not working if ACL authorization file comes from an extra volume mount #1028