astarte-platform / astarte-kubernetes-operator

Astarte Kubernetes Operator
http://astarte-platform.org
Apache License 2.0
22 stars 9 forks source link

VerneMQ is failing to start #388

Closed IvaskevychYuriy closed 2 weeks ago

IvaskevychYuriy commented 2 weeks ago

Hello, I have an issue running Astarte operator as the VerneMQ pod is continuously crashing.

Setup

Infra:

Existing components:

For reference, here are values overrides for cassandra chart:

dbUser:
  user: test
  password: test
  forcePassword: true

cluster:
  name: cassandra
  seedCount: 1
  numTokens: 256
  datacenter: dc1
  rack: rack1
  endpointSnitch: SimpleSnitch

And for rabbitmq:

auth:
  username: test
  password: test
  erlangCookie: test

Install Astarte operator via

helm upgrade -i astarte-operator astarte/astarte-operator --version 24.5.0

Install certificate by applying (note, I already have root-ca ClusterIssuer):

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: tls2
  namespace: default
spec:
  privateKey:
    algorithm: RSA
    encoding: PKCS1
    size: 2048
  isCA: false
  usages:
    - server auth
    - client auth
  commonName: Astarte
  dnsNames:
    - astarte-broker.local.coom
  secretName: tls2
  issuerRef:
    name: root-ca
    kind: ClusterIssuer

Install Astarte CR and point to existing Cassandra and RabbitMQ instances:

apiVersion: api.astarte-platform.org/v1alpha3
kind: Astarte
metadata:
  name: astarte
  namespace: default
spec:
  version: 1.2.0
  api:
    host: astarte-api.local.com
  cassandra:
    deploy: false
    nodes: cassandra:9042
    connection:
      username: test
      password: test
  cfssl:
    resources:
      limits:
        cpu: 100m
        memory: 256Mi
      requests:
        cpu: 20m
        memory: 128Mi
    storage:
      size: 1Gi
  rabbitmq:
    deploy: false
    connection:
      host: astarte-broker-rabbitmq
      username: test
      password: test
      port: 5672
      virtualHost: /
  vernemq:
    host: astarte-broker.local.com
    sslListener: true
    sslListenerCertSecretName: tls2
    resources:
      limits:
        cpu: 300m
        memory: 512Mi
      requests:
        cpu: 100m
        memory: 256Mi

Issue

Eventually all the pods run fine except the VerneMQ one: image

Logs reveal the following:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 31820    0 31820    0     0  1942k      0 --:--:-- --:--:-- --:--:-- 1942k
/opt/vernemq/bin/vernemq.sh: line 29: warning: command substitution: ignored null byte in input
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 31820    0 31820    0     0   327k      0 --:--:-- --:--:-- --:--:--  323k
Error generating config with cuttlefish
  run `vernemq config generate -l debug` for more information.
/opt/vernemq/bin/vernemq.sh: line 179: ps: command not found

Describe shows the following version of the container (if this helps)

    Container ID:  docker://4092584bd2ffb412ca6994c21d5338d56ad9c24bab67bcea98b1845dab67247e
    Image:         astarte/vernemq:1.2.0

What I've tried

The error seems originating from here

So I've edited statefulset to keep the pod in sleep, exec into it and installed the stuff using apt-get update && apt-get install -y procps Then ran /opt/vernemq/bin/vernemq.sh manually and got the same error but this time without the last line (about command not found)

I've also tried reinstalling it along with the operator, deleting the volume - still the same.

Any ideas?

Annopaolo commented 2 weeks ago

Hi @IvaskevychYuriy! Could you please also provide the Vernemq statefulset? kubectl get statefulset astarte-vernemq -o yaml

IvaskevychYuriy commented 2 weeks ago

@Annopaolo Hello, sure. Here it is:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  creationTimestamp: "2024-11-13T11:47:06Z"
  generation: 3
  labels:
    component: astarte
  name: astarte-vernemq
  namespace: default
  ownerReferences:
  - apiVersion: api.astarte-platform.org/v1alpha2
    blockOwnerDeletion: true
    controller: true
    kind: Astarte
    name: astarte
    uid: 0428a635-544e-4308-b404-298f6644af97
  resourceVersion: "656309"
  uid: f697c0ec-07ec-4d97-bf4e-52edf13c37e8
spec:
  persistentVolumeClaimRetentionPolicy:
    whenDeleted: Retain
    whenScaled: Retain
  podManagementPolicy: OrderedReady
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: astarte-vernemq
  serviceName: astarte-vernemq
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: astarte-vernemq
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - astarte-vernemq
            topologyKey: kubernetes.io/hostname
      containers:
      - env:
        - name: MY_POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: MY_POD_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        - name: DOCKER_VERNEMQ_DISCOVERY_KUBERNETES
          value: "1"
        - name: DOCKER_VERNEMQ_KUBERNETES_LABEL_SELECTOR
          value: app=astarte-vernemq
        - name: DOCKER_VERNEMQ_ASTARTE_VMQ_PLUGIN__AMQP__VIRTUAL_HOST
          value: /
        - name: DOCKER_VERNEMQ_ASTARTE_VMQ_PLUGIN__AMQP__HOST
          value: astarte-broker-rabbitmq
        - name: DOCKER_VERNEMQ_ASTARTE_VMQ_PLUGIN__AMQP__PORT
          value: "5672"
        - name: DOCKER_VERNEMQ_ASTARTE_VMQ_PLUGIN__AMQP__VIRTUAL_HOST
          value: /
        - name: DOCKER_VERNEMQ_ASTARTE_VMQ_PLUGIN__AMQP__USERNAME
          valueFrom:
            secretKeyRef:
              key: admin-username
              name: astarte-rabbitmq-user-credentials
        - name: DOCKER_VERNEMQ_ASTARTE_VMQ_PLUGIN__AMQP__PASSWORD
          valueFrom:
            secretKeyRef:
              key: admin-password
              name: astarte-rabbitmq-user-credentials
        - name: RPC_AMQP_CONNECTION_VIRTUAL_HOST
          value: /
        - name: RPC_AMQP_CONNECTION_HOST
          value: astarte-broker-rabbitmq
        - name: RPC_AMQP_CONNECTION_PORT
          value: "5672"
        - name: RPC_AMQP_CONNECTION_VIRTUAL_HOST
          value: /
        - name: RPC_AMQP_CONNECTION_USERNAME
          valueFrom:
            secretKeyRef:
              key: admin-username
              name: astarte-rabbitmq-user-credentials
        - name: RPC_AMQP_CONNECTION_PASSWORD
          valueFrom:
            secretKeyRef:
              key: admin-password
              name: astarte-rabbitmq-user-credentials
        - name: DOCKER_VERNEMQ_ASTARTE_VMQ_PLUGIN__AMQP__DATA_QUEUE_COUNT
          value: "128"
        - name: VERNEMQ_ENABLE_SSL_LISTENER
          value: "true"
        - name: DOCKER_VERNEMQ_LISTENER__SSL__DEFAULT__CAFILE
          value: /opt/vernemq/etc/ca.pem
        - name: DOCKER_VERNEMQ_LISTENER__SSL__DEFAULT__CERTFILE
          value: /opt/vernemq/etc/cert.pem
        - name: DOCKER_VERNEMQ_LISTENER__SSL__DEFAULT__KEYFILE
          value: /opt/vernemq/etc/privkey.pem
        - name: CFSSL_URL
          value: http://astarte-cfssl.default.svc.cluster.local/
        - name: DOCKER_VERNEMQ_PERSISTENT_CLIENT_EXPIRATION
          value: 1y
        - name: DOCKER_VERNEMQ_MAX_OFFLINE_MESSAGES
          value: "1000000"
        - name: DOCKER_VERNEMQ_ASTARTE_VMQ_PLUGIN__CASSANDRA__USERNAME
        - name: DOCKER_VERNEMQ_ASTARTE_VMQ_PLUGIN__CASSANDRA__PASSWORD
        - name: DOCKER_VERNEMQ_ASTARTE_VMQ_PLUGIN__CASSANDRA__NODES
          value: cassandra:9042
        image: astarte/vernemq:1.2.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /metrics
            port: 8888
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 20
          successThreshold: 1
          timeoutSeconds: 10
        name: vernemq
        ports:
        - containerPort: 8883
          name: mqtt-ssl
          protocol: TCP
        - containerPort: 80
          name: acme-verify
          protocol: TCP
        - containerPort: 1883
          name: mqtt
          protocol: TCP
        - containerPort: 1885
          name: mqtt-reverse
          protocol: TCP
        - containerPort: 44053
          name: vmq-msg-dist
          protocol: TCP
        - containerPort: 4369
          name: epmd
          protocol: TCP
        - containerPort: 8888
          name: metrics
          protocol: TCP
        - containerPort: 9100
          protocol: TCP
        - containerPort: 9101
          protocol: TCP
        - containerPort: 9102
          protocol: TCP
        - containerPort: 9103
          protocol: TCP
        - containerPort: 9104
          protocol: TCP
        - containerPort: 9105
          protocol: TCP
        - containerPort: 9106
          protocol: TCP
        - containerPort: 9107
          protocol: TCP
        - containerPort: 9108
          protocol: TCP
        - containerPort: 9109
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /metrics
            port: 8888
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 20
          successThreshold: 1
          timeoutSeconds: 10
        resources:
          limits:
            cpu: 300m
            memory: 512Mi
          requests:
            cpu: 100m
            memory: 256Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /opt/vernemq/data
          name: astarte-vernemq-data
        - mountPath: /etc/ssl/vernemq-certs
          name: astarte-tls
          readOnly: true
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: astarte-vernemq
      serviceAccountName: astarte-vernemq
      terminationGracePeriodSeconds: 30
      volumes:
      - name: tls2
        secret:
          defaultMode: 420
          items:
          - key: tls.crt
            path: cert
          - key: tls.key
            path: privkey
          secretName: tls2
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: astarte-vernemq-data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 4G
      volumeMode: Filesystem
    status:
      phase: Pending
status:
  availableReplicas: 0
  collisionCount: 0
  currentReplicas: 1
  currentRevision: astarte-vernemq-67f8db74c9
  observedGeneration: 3
  replicas: 1
  updateRevision: astarte-vernemq-67f8db74c9
  updatedReplicas: 1
Annopaolo commented 2 weeks ago

There was an issue with passing down the DOCKER_VERNEMQ_ASTARTE_VMQ_PLUGIN__CASSANDRA__USERNAME and DOCKER_VERNEMQ_ASTARTE_VMQ_PLUGIN__CASSANDRA__PASSWORD env variables from the Astarte CR to the statefulset (as you can see, they have no value). It has been fixed in #386. However, we have not yet released a new patch version of the operator, so I would suggest using the 24.5-snapshot Astarte Kubernetes Operator image until the 24.5.1 release is made.

IvaskevychYuriy commented 2 weeks ago

@Annopaolo Yup, it did the trick, VerneMQ broker is up & running. Much thanks! The issue can be closed then.