[bitnami/rabbitmq] Cluster does not recover after reboot, even with forceBoot: true

razvanphp commented 5 months ago

Name and Version

bitnami/rabbitmq 12.15.0

What architecture are you using?

amd64

What steps will reproduce the bug?

We run this chart in TrueNAS server, deployed with fluxCD. With 3 pods, restart the k3s node, and cluster will not recover.

Are you using any custom parameters or values?

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: rabbitmq
  namespace: vampirebyte
spec:
  releaseName: rabbitmq
  chart:
    spec:
      chart: rabbitmq
      version: 12.15.0
      sourceRef:
        kind: HelmRepository
        name: bitnami
        namespace: flux-system
  interval: 1h
  values:
    extraPlugins: rabbitmq_mqtt rabbitmq_web_mqtt
    image:
      debug: true
    # diagnosticMode:
    #   enabled: true
    clustering:
      forceBoot: true
    auth:
      tls:
        enabled: true
        existingSecret: rabbitmq-tls-secret
        existingSecretFullChain: true
    replicaCount: 3
    ingress:
      enabled: true
      hostname: mq.[blablabla]
      tls: true
      existingSecret: rabbitmq-tls-secret
      extraPaths:
      - path: /ws
        pathType: Prefix
        backend:
          service:
            name: rabbitmq
            port:
              number: 15675
      annotations:
        traefik.ingress.kubernetes.io/router.entrypoints: websecure
        traefik.ingress.kubernetes.io/router.tls: 'true'
    extraConfiguration: |
      log.console = true

      web_mqtt.ssl.port       = 15676
      web_mqtt.ssl.backlog    = 1024
      web_mqtt.ssl.cacertfile = /opt/bitnami/rabbitmq/certs/ca_certificate.pem
      web_mqtt.ssl.certfile   = /opt/bitnami/rabbitmq/certs/server_certificate.pem
      web_mqtt.ssl.keyfile    = /opt/bitnami/rabbitmq/certs/server_key.pem

      management.tcp.port       = 15672
      management.ssl.port       = 15671
      management.ssl.cacertfile = /opt/bitnami/rabbitmq/certs/ca_certificate.pem
      management.ssl.certfile   = /opt/bitnami/rabbitmq/certs/server_certificate.pem
      management.ssl.keyfile    = /opt/bitnami/rabbitmq/certs/server_key.pem
    service:
      type: LoadBalancer
      extraPorts:
      - name: manager-tls
        port: 15671
        targetPort: 15671
      - name: mqtt
        port: 1883
        targetPort: 1883
      - name: web-mqtt
        port: 15675
        targetPort: 15675
      - name: web-mqtt-tls
        port: 15676
        targetPort: 15676
    metrics:
      enabled: true
      serviceMonitor:
        enabled: true
        namespace: vampirebyte

What is the expected behavior?

Cluster should be able to recover, seems that

    clustering:
      forceBoot: true

does not help.

What do you see instead?

Cluster (of 3 nodes) is not able to recover after server shutdown.

Additional information

So the readiness and liveness probes fail with:

2024-05-13 11:22:22 | Pod | Warning | Readiness probe failed: % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 curl: (7) Failed to connect to 127.0.0.1 port 15672 after 0 ms: Couldn't connect to server
-- | -- | -- | --
2024-05-13 11:12:49 | Pod | Warning | (combined from similar events): Readiness probe errored: rpc error: code = NotFound desc = failed to exec in container: failed to load task: no running task found: task a87cd7ea5427a853b2cc3a7e8b8ec647db7c7d5998bbf9c510cc4543985dad65 not found: not found
2024-05-13 10:42:49 | Pod | Warning | Liveness probe failed: % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 curl: (7) Failed to connect to 127.0.0.1 port 15672 after 0 ms: Couldn't connect to server

Checking the logs I see those, only one pod is up instead of 3:

rabbitmq 08:33:50.99 INFO  ==> 
rabbitmq 08:33:50.99 INFO  ==> Welcome to the Bitnami rabbitmq container
rabbitmq 08:33:51.00 INFO  ==> Subscribe to project updates by watching https://github.com/bitnami/containers
rabbitmq 08:33:51.00 INFO  ==> Submit issues and feature requests at https://github.com/bitnami/containers/issues
rabbitmq 08:33:51.00 INFO  ==> 
rabbitmq 08:33:51.01 INFO  ==> ** Starting RabbitMQ setup **
rabbitmq 08:33:51.03 INFO  ==> Validating settings in RABBITMQ_* env vars..
rabbitmq 08:33:51.06 INFO  ==> Initializing RabbitMQ...
rabbitmq 08:33:51.07 DEBUG ==> Ensuring expected directories/files exist...
rabbitmq 08:33:51.10 INFO  ==> Persisted data detected. Restoring...
rabbitmq 08:33:51.11 INFO  ==> No custom scripts in /docker-entrypoint-initdb.d
rabbitmq 08:33:51.11 INFO  ==> ** RabbitMQ setup finished! **

rabbitmq 08:33:51.13 INFO  ==> ** Starting RabbitMQ **
2024-05-13 08:33:56.761574+00:00 [notice] <0.44.0> Application syslog exited with reason: stopped
2024-05-13 08:33:56.769160+00:00 [notice] <0.235.0> Logging: switching to configured handler(s); following messages may not be visible in this log output
2024-05-13 08:33:56.770500+00:00 [notice] <0.235.0> Logging: configured log handlers are now ACTIVE
2024-05-13 08:33:57.288081+00:00 [info] <0.235.0> ra: starting system quorum_queues
2024-05-13 08:33:57.288299+00:00 [info] <0.235.0> starting Ra system: quorum_queues in directory: /opt/bitnami/rabbitmq/.rabbitmq/mnesia/rabbit@rabbitmq-0.rabbitmq-headless.vampirebyte.svc.cluster.local/quorum/rabbit@rabbitmq-0.rabbitmq-headless.vampirebyte.svc.cluster.local
2024-05-13 08:33:57.318080+00:00 [info] <0.298.0> ra system 'quorum_queues' running pre init for 0 registered servers
2024-05-13 08:33:57.333529+00:00 [info] <0.299.0> ra: meta data store initialised for system quorum_queues. 0 record(s) recovered
2024-05-13 08:33:57.356109+00:00 [notice] <0.304.0> WAL: ra_log_wal init, open tbls: ra_log_open_mem_tables, closed tbls: ra_log_closed_mem_tables
2024-05-13 08:33:57.365647+00:00 [info] <0.235.0> ra: starting system coordination
2024-05-13 08:33:57.365767+00:00 [info] <0.235.0> starting Ra system: coordination in directory: /opt/bitnami/rabbitmq/.rabbitmq/mnesia/rabbit@rabbitmq-0.rabbitmq-headless.vampirebyte.svc.cluster.local/coordination/rabbit@rabbitmq-0.rabbitmq-headless.vampirebyte.svc.cluster.local
2024-05-13 08:33:57.367616+00:00 [info] <0.311.0> ra system 'coordination' running pre init for 1 registered servers
2024-05-13 08:33:57.379144+00:00 [info] <0.312.0> ra: meta data store initialised for system coordination. 1 record(s) recovered
2024-05-13 08:33:57.379493+00:00 [notice] <0.317.0> WAL: ra_coordination_log_wal init, open tbls: ra_coordination_log_open_mem_tables, closed tbls: ra_coordination_log_closed_mem_tables
2024-05-13 08:33:57.383061+00:00 [info] <0.235.0> 
2024-05-13 08:33:57.383061+00:00 [info] <0.235.0>  Starting RabbitMQ 3.12.13 on Erlang 26.2.2 [jit]
2024-05-13 08:33:57.383061+00:00 [info] <0.235.0>  Copyright (c) 2007-2024 Broadcom Inc and/or its subsidiaries
2024-05-13 08:33:57.383061+00:00 [info] <0.235.0>  Licensed under the MPL 2.0. Website: https://rabbitmq.com

  ##  ##      RabbitMQ 3.12.13
  ##  ##
  ##########  Copyright (c) 2007-2024 Broadcom Inc and/or its subsidiaries
  ######  ##
  ##########  Licensed under the MPL 2.0. Website: https://rabbitmq.com

  Erlang:      26.2.2 [jit]
  TLS Library: OpenSSL - OpenSSL 3.0.11 19 Sep 2023
  Release series support status: supported

  Doc guides:  https://rabbitmq.com/documentation.html
  Support:     https://rabbitmq.com/contact.html
  Tutorials:   https://rabbitmq.com/getstarted.html
  Monitoring:  https://rabbitmq.com/monitoring.html

  Logs: <stdout>

  Config file(s): /opt/bitnami/rabbitmq/etc/rabbitmq/rabbitmq.conf

  Starting broker...2024-05-13 08:33:57.385071+00:00 [info] <0.235.0> 
2024-05-13 08:33:57.385071+00:00 [info] <0.235.0>  node           : rabbit@rabbitmq-0.rabbitmq-headless.vampirebyte.svc.cluster.local
2024-05-13 08:33:57.385071+00:00 [info] <0.235.0>  home dir       : /opt/bitnami/rabbitmq/.rabbitmq
2024-05-13 08:33:57.385071+00:00 [info] <0.235.0>  config file(s) : /opt/bitnami/rabbitmq/etc/rabbitmq/rabbitmq.conf
2024-05-13 08:33:57.385071+00:00 [info] <0.235.0>  cookie hash    : 6VFbBJ7VGzoEZcYhv/CvoA==
2024-05-13 08:33:57.385071+00:00 [info] <0.235.0>  log(s)         : <stdout>
2024-05-13 08:33:57.385071+00:00 [info] <0.235.0>  data dir       : /opt/bitnami/rabbitmq/.rabbitmq/mnesia/rabbit@rabbitmq-0.rabbitmq-headless.vampirebyte.svc.cluster.local
2024-05-13 08:34:02.654176+00:00 [info] <0.235.0> Running boot step pre_boot defined by app rabbit
2024-05-13 08:34:02.654400+00:00 [info] <0.235.0> Running boot step rabbit_global_counters defined by app rabbit
2024-05-13 08:34:02.655056+00:00 [info] <0.235.0> Running boot step rabbit_osiris_metrics defined by app rabbit
2024-05-13 08:34:02.655294+00:00 [info] <0.235.0> Running boot step rabbit_core_metrics defined by app rabbit
2024-05-13 08:34:02.659282+00:00 [info] <0.235.0> Running boot step rabbit_alarm defined by app rabbit
2024-05-13 08:34:02.666451+00:00 [info] <0.336.0> Memory high watermark set to 51508 MiB (54010989772 bytes) of 128772 MiB (135027474432 bytes) total
2024-05-13 08:34:02.671349+00:00 [info] <0.338.0> Enabling free disk space monitoring (disk free space: 8588099584, total memory: 135027474432)
2024-05-13 08:34:02.671460+00:00 [info] <0.338.0> Disk free limit set to 50MB
2024-05-13 08:34:02.673754+00:00 [info] <0.235.0> Running boot step code_server_cache defined by app rabbit
2024-05-13 08:34:02.673943+00:00 [info] <0.235.0> Running boot step file_handle_cache defined by app rabbit
2024-05-13 08:34:02.674389+00:00 [info] <0.341.0> Limiting to approx 1048479 file handles (943629 sockets)
2024-05-13 08:34:02.674597+00:00 [info] <0.342.0> FHC read buffering: OFF
2024-05-13 08:34:02.674793+00:00 [info] <0.342.0> FHC write buffering: ON
2024-05-13 08:34:02.675329+00:00 [info] <0.235.0> Running boot step worker_pool defined by app rabbit
2024-05-13 08:34:02.675470+00:00 [info] <0.319.0> Will use 40 processes for default worker pool
2024-05-13 08:34:02.675563+00:00 [info] <0.319.0> Starting worker pool 'worker_pool' with 40 processes in it
2024-05-13 08:34:02.677585+00:00 [info] <0.235.0> Running boot step database defined by app rabbit
2024-05-13 08:34:02.687329+00:00 [info] <0.235.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2024-05-13 08:34:32.688602+00:00 [warning] <0.235.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,
2024-05-13 08:34:32.688602+00:00 [warning] <0.235.0>                                         ['rabbit@rabbitmq-2.rabbitmq-headless.vampirebyte.svc.cluster.local',
2024-05-13 08:34:32.688602+00:00 [warning] <0.235.0>                                          'rabbit@rabbitmq-1.rabbitmq-headless.vampirebyte.svc.cluster.local',
2024-05-13 08:34:32.688602+00:00 [warning] <0.235.0>                                          'rabbit@rabbitmq-0.rabbitmq-headless.vampirebyte.svc.cluster.local'],
2024-05-13 08:34:32.688602+00:00 [warning] <0.235.0>                                         [rabbit_user,rabbit_user_permission,
2024-05-13 08:34:32.688602+00:00 [warning] <0.235.0>                                          rabbit_topic_permission,rabbit_vhost,
2024-05-13 08:34:32.688602+00:00 [warning] <0.235.0>                                          rabbit_durable_route,
2024-05-13 08:34:32.688602+00:00 [warning] <0.235.0>                                          rabbit_durable_exchange,
2024-05-13 08:34:32.688602+00:00 [warning] <0.235.0>                                          rabbit_runtime_parameters,
2024-05-13 08:34:32.688602+00:00 [warning] <0.235.0>                                          rabbit_durable_queue]}
2024-05-13 08:34:32.688959+00:00 [info] <0.235.0> Waiting for Mnesia tables for 30000 ms, 8 retries left
2024-05-13 08:35:02.689586+00:00 [warning] <0.235.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,
2024-05-13 08:35:02.689586+00:00 [warning] <0.235.0>                                         ['rabbit@rabbitmq-2.rabbitmq-headless.vampirebyte.svc.cluster.local',
2024-05-13 08:35:02.689586+00:00 [warning] <0.235.0>                                          'rabbit@rabbitmq-1.rabbitmq-headless.vampirebyte.svc.cluster.local',
2024-05-13 08:35:02.689586+00:00 [warning] <0.235.0>                                          'rabbit@rabbitmq-0.rabbitmq-headless.vampirebyte.svc.cluster.local'],
2024-05-13 08:35:02.689586+00:00 [warning] <0.235.0>                                         [rabbit_user,rabbit_user_permission,
2024-05-13 08:35:02.689586+00:00 [warning] <0.235.0>                                          rabbit_topic_permission,rabbit_vhost,
2024-05-13 08:35:02.689586+00:00 [warning] <0.235.0>                                          rabbit_durable_route,
2024-05-13 08:35:02.689586+00:00 [warning] <0.235.0>                                          rabbit_durable_exchange,
2024-05-13 08:35:02.689586+00:00 [warning] <0.235.0>                                          rabbit_runtime_parameters,
2024-05-13 08:35:02.689586+00:00 [warning] <0.235.0>                                          rabbit_durable_queue]}
2024-05-13 08:35:02.689900+00:00 [info] <0.235.0> Waiting for Mnesia tables for 30000 ms, 7 retries left
2024-05-13 08:35:32.690564+00:00 [warning] <0.235.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,
2024-05-13 08:35:32.690564+00:00 [warning] <0.235.0>                                         ['rabbit@rabbitmq-2.rabbitmq-headless.vampirebyte.svc.cluster.local',
2024-05-13 08:35:32.690564+00:00 [warning] <0.235.0>                                          'rabbit@rabbitmq-1.rabbitmq-headless.vampirebyte.svc.cluster.local',
2024-05-13 08:35:32.690564+00:00 [warning] <0.235.0>                                          'rabbit@rabbitmq-0.rabbitmq-headless.vampirebyte.svc.cluster.local'],
2024-05-13 08:35:32.690564+00:00 [warning] <0.235.0>                                         [rabbit_user,rabbit_user_permission,
2024-05-13 08:35:32.690564+00:00 [warning] <0.235.0>                                          rabbit_topic_permission,rabbit_vhost,
2024-05-13 08:35:32.690564+00:00 [warning] <0.235.0>                                          rabbit_durable_route,
2024-05-13 08:35:32.690564+00:00 [warning] <0.235.0>                                          rabbit_durable_exchange,
2024-05-13 08:35:32.690564+00:00 [warning] <0.235.0>                                          rabbit_runtime_parameters,
2024-05-13 08:35:32.690564+00:00 [warning] <0.235.0>                                          rabbit_durable_queue]}
2024-05-13 08:35:32.691124+00:00 [info] <0.235.0> Waiting for Mnesia tables for 30000 ms, 6 retries left

javsalgar commented 5 months ago

Hi!

In this kind of scenarios it may be necessary to perform some manual intervention. Could you try running the chart with diagnosticMode.enabled=true and try to perform the initialization steps manually? You can run kubectl exec to enter the container and then run this command

/opt/bitnami/scripts/rabbitmq/entrypoint.sh /opt/bitnami/rabbitmq/rabbitmq/run.sh

razvanphp commented 4 months ago

Well, I know how to manually fix it, but what I'm suggesting is to find a better way to handle this in the chart, as currently, I would not trust to deploy this to production, one cannot expect that the nodes will always shutdown in a specific order.

root@truenas[~]# kubectl patch statefulset rabbitmq -n vampirebyte --type='json' -p='[{"op": "remove", "path": "/spec/template/spec/containers/0/livenessProbe"}]'

statefulset.apps/rabbitmq patched

root@truenas[~]# kubectl patch statefulset rabbitmq -n vampirebyte --type='json' -p='[{"op": "remove", "path": "/spec/template/spec/containers/0/readinessProbe"}]'

statefulset.apps/rabbitmq patched

root@truenas[~]# kubectl get pods -n vampirebyte -l app.kubernetes.io/instance=rabbitmq

NAME         READY   STATUS    RESTARTS          AGE
rabbitmq-0   0/1     Running   217 (2m45s ago)   25h
root@truenas[~]# kubectl scale statefulset rabbitmq --replicas=3 -n vampirebyte

statefulset.apps/rabbitmq scaled

root@truenas[~]# kubectl get pods -n vampirebyte -l app.kubernetes.io/instance=rabbitmq

NAME         READY   STATUS    RESTARTS          AGE
rabbitmq-0   0/1     Running   217 (3m21s ago)   25h
root@truenas[~]#
root@truenas[~]#
root@truenas[~]# kubectl delete pod rabbitmq-0 -n vampirebyte
pod "rabbitmq-0" deleted
root@truenas[~]# kubectl get pods -n vampirebyte -l app.kubernetes.io/instance=rabbitmq
NAME         READY   STATUS    RESTARTS   AGE
rabbitmq-0   1/1     Running   0          83s
rabbitmq-1   1/1     Running   0          79s
rabbitmq-2   1/1     Running   0          75s
root@truenas[~]#

I think the main issue to solve is to make sure all 3 pods are started all the tine, no matter if the probes fail, otherwise the cluster will never recover with just 1 node....

javsalgar commented 4 months ago

Hi!

We plan to change the podManagementPolicy to Parallel to avoid this kind of issues. In the meantime, you can set it in the values.yaml but we plan to change it by default. This was recommended by the upstream RabbitMQ team, you can see it here: https://github.com/bitnami/charts/issues/16081#issuecomment-2106462797

michaelklishin commented 4 months ago

@razvanphp RabbitMQ nodes do not expect any specific startup or shutdown sequence starting with 3.7.0. They do expect all peers to come online within 5 minutes by default.

The (in)famous `OrderedReady` deployment deadlock on Kubernetes

Specifically for Kubernetes and similar tools can run into a deployment deadlock which has been documented in various ways for a while:

Restarts and Health Checks in the Clustering guide
Kubernetes-specific requirements in the Cluster Formation guide
In an August 2020 blog post about DIY RabbitMQ clusters on Kubernetes

Wrong and dangerous solutions

Using forceBoot is a completely unnecessary and dangerous way of "fixing" the problem. You are not fixing anything, you are using a specialized mechanism designed to be used when a portion of the cluster is permanently lost.

Safe and optimal solutions

The easiest option by far is to use the Cluster Operator that also has been around for a while and is maintained by the RabbitMQ core team. When that's not possible, using

rabbitmq-diagnostics ping for the readiness probe
podManagementPolicy: "Parallel" for the stateful set

should be enough. That's what the Cluster Operator does, specifically the latter part.

michaelklishin commented 4 months ago

@javsalgar given that this question keeps coming up and one way or another, the (completely wrong, as stated many times earlier) recommendation of using forceBoot: true keeps coming up, I guess https://github.com/bitnami/charts/issues/16081#issuecomment-2106462797 should be a top priority for the RabbitMQ chart.

RabbitMQ can log an extra message when it runs out of attempts contacting cluster peers but we can tell from experience that virtually no one reads logs until told so explicitly.

And since the core team does not have much influence over the "DIY" (Operator-less) installations on Kubernetes, this long understood and solved problem keeps popping up.

razvanphp commented 4 months ago

I just want to mention that the error logs are not displayed by default, one must also set

    image:
      debug: true

to see what actually happens, the mnesia tables error.

I would suggest we go back to basics and make things easy again, like removing the forceBoot completely, even from suggestions and align with what @michaelklishin is saying.

michaelklishin commented 4 months ago

@javsalgar here's a PR to get the ball rolling: https://github.com/bitnami/charts/pull/25873. Hopefully it will stop the bleeding (this kind of questions) and direct folks towards understanding what's going on with their deployments and what are the two options they have :)

michaelklishin commented 4 months ago

As for whether forceBoot should be removed, I don't have an opinion. With the right defaults and documentation in place, it won't be used much (by new deployments anyway).

If @javsalgar and his team decide to remove forceBoot, I'm not going to complain because an option of running rabbitmqctl force_boot or setting the env variable would both still be around.

@javsalgar is the logging behavior mentioned above intentional? RabbitMQ community Docker image does not suppress nodes by default, so I'm curious why this is the case. I'd personally always want more users to have easy access to RabbitMQ logs since that's the very first thing we ask for, both on GitHub and in response to commercial tickets.

razvanphp commented 4 months ago

Regarding logging, if I don't add that image debug true, logs stop here and never output anything:

rabbitmq 08:33:51.13 INFO  ==> ** Starting RabbitMQ **
2024-05-13 08:33:56.761574+00:00 [notice] <0.44.0> Application syslog exited with reason: stopped
2024-05-13 08:33:56.769160+00:00 [notice] <0.235.0> Logging: switching to configured handler(s); following messages may not be visible in this log output
2024-05-13 08:33:56.770500+00:00 [notice] <0.235.0> Logging: configured log handlers are now ACTIVE

I thought it uses syslog, but RABBITMQ_LOGS=- means it's using stdout, so should not be syslog detected inside docker, right?

Regarding forceBoot I think it should be removed especially because besides the confusion, it does not solve the problem described in this issue, I've tried it. I still had to manually intervene.

javsalgar commented 4 months ago

@javsalgar is the logging behavior mentioned above intentional? RabbitMQ community Docker image does not suppress nodes by default, so I'm curious why this is the case. I'd personally always want more users to have easy access to RabbitMQ logs since that's the very first thing we ask for, both on GitHub and in response to commercial tickets.

We show the application and error logs to stdout by default. The one that we supress has to do with the bash initialization logic to avoid adding unnecessary noise to the initialization logs, unless it fails. However, it makes sense to revisit the logging of that specific part of the initialization to make it easier to spot any error.

github-actions[bot] commented 4 months ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] commented 4 months ago

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

bitnami / charts