bitnami / charts

Bitnami Helm Charts
https://bitnami.com
Other
8.67k stars 9.02k forks source link

Unable to cleanly restart or recover a mariadb-galera cluster #8721

Closed jfillatre closed 2 years ago

jfillatre commented 2 years ago

Which chart:

mariadb-galera-6.0.6 appVersion 10.6.5 Mariadb container image: docker.io/bitnami/mariadb-galera:10.4.22-debian-10-r20

Describe the bug

Previously with 4.3.3 chart and 10.1.46-debian-10-r17 image tag, I was able to cleanly restart a cluster by scaling down/scaling up the Statefulset. I was also able to recover a unclean cluster failure apply documented procedure, and re applying initial chart configuration after a graceful shutdown.

It can't be done with 10.4.22-debian-10-r20 container image, first node never able to start:

k logs mariadb-galera-0 mariadb-galera -f
mariadb 16:13:47.63 
mariadb 16:13:47.63 Welcome to the Bitnami mariadb-galera container
mariadb 16:13:47.63 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-mariadb-galera
mariadb 16:13:47.63 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-mariadb-galera/issues
mariadb 16:13:47.63 
mariadb 16:13:47.64 INFO  ==> ** Starting MariaDB setup **
mariadb 16:13:47.66 INFO  ==> Validating settings in MYSQL_*/MARIADB_* env vars
mariadb 16:13:47.67 INFO  ==> Initializing mariadb database
mariadb 16:13:47.68 WARN  ==> The mariadb configuration file '/opt/bitnami/mariadb/conf/my.cnf' is not writable or does not exist. Configurations based on environment variables will not be applied for this file.
mariadb 16:13:47.68 INFO  ==> Persisted data detected. Restoring
mariadb 16:13:47.69 INFO  ==> ** MariaDB setup finished! **

mariadb 16:13:47.74 INFO  ==> ** Starting MariaDB **
mariadb 16:13:47.74 INFO  ==> Setting previous boot
2022-01-18 16:13:47 0 [Note] /opt/bitnami/mariadb/sbin/mysqld (mysqld 10.4.22-MariaDB-log) starting as process 1 ...
2022-01-18 16:13:47 0 [Note] WSREP: Loading provider /opt/bitnami/mariadb/lib/libgalera_smm.so initial position: 00000000-0000-0000-0000-000000000000:-1
2022-01-18 16:13:47 0 [Note] WSREP: wsrep_load(): loading provider library '/opt/bitnami/mariadb/lib/libgalera_smm.so'
2022-01-18 16:13:47 0 [Note] WSREP: wsrep_load(): Galera 4.9(rXXXX) by Codership Oy <info@codership.com> loaded successfully.
2022-01-18 16:13:47 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2022-01-18 16:13:47 0 [Note] WSREP: Found saved state: 7cf4e391-7878-11ec-abf9-cbf61c56b744:-1, safe_to_bootstrap: 1
2022-01-18 16:13:47 0 [Note] WSREP: GCache DEBUG: opened preamble:
Version: 2
UUID: 7cf4e391-7878-11ec-abf9-cbf61c56b744
Seqno: 1 - 17
Offset: 1280
Synced: 1
2022-01-18 16:13:47 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: 7cf4e391-7878-11ec-abf9-cbf61c56b744, offset: 1280
2022-01-18 16:13:47 0 [Note] WSREP: GCache::RingBuffer initial scan...  0.0% (        0/134217752 bytes) complete.
2022-01-18 16:13:47 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (134217752/134217752 bytes) complete.
2022-01-18 16:13:47 0 [Note] WSREP: Recovering GCache ring buffer: found gapless sequence 1-17
2022-01-18 16:13:47 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...  0.0% (   0/7392 bytes) complete.
2022-01-18 16:13:47 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...100.0% (7392/7392 bytes) complete.
2022-01-18 16:13:47 0 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): found 4/21 locked buffers
2022-01-18 16:13:47 0 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): free space: 134210824/134217728
2022-01-18 16:13:47 0 [Note] WSREP: Passing config to GCS: base_dir = /bitnami/mariadb/data/; base_host = 10.240.0.124; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /bitnami/mariadb/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; 
2022-01-18 16:13:47 0 [Note] WSREP: Start replication
2022-01-18 16:13:47 0 [Note] WSREP: Connecting with bootstrap option: 0
2022-01-18 16:13:47 0 [Note] WSREP: Setting GCS initial position to 00000000-0000-0000-0000-000000000000:-1
2022-01-18 16:13:47 0 [Note] WSREP: protonet asio version 0
2022-01-18 16:13:47 0 [Note] WSREP: Using CRC-32C for message checksums.
2022-01-18 16:13:47 0 [Note] WSREP: backend: asio
2022-01-18 16:13:47 0 [Note] WSREP: gcomm thread scheduling priority set to other:0 
2022-01-18 16:13:47 0 [Warning] WSREP: access file(/bitnami/mariadb/data//gvwstate.dat) failed(No such file or directory)
2022-01-18 16:13:47 0 [Note] WSREP: restore pc from disk failed
2022-01-18 16:13:47 0 [Note] WSREP: GMCast version 0
2022-01-18 16:13:47 0 [Warning] WSREP: Failed to resolve tcp://mariadb-galera-headless.tpe-dev.svc.cluster.local:4567
2022-01-18 16:13:47 0 [Note] WSREP: (9df5b66f-9bce, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2022-01-18 16:13:47 0 [Note] WSREP: (9df5b66f-9bce, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2022-01-18 16:13:47 0 [Note] WSREP: EVS version 1
2022-01-18 16:13:47 0 [Note] WSREP: gcomm: connecting to group 'galera', peer 'mariadb-galera-headless.tpe-dev.svc.cluster.local:'
2022-01-18 16:13:47 0 [ERROR] WSREP: failed to open gcomm backend connection: 131: No address to connect (FATAL)
     at /bitnami/blacksmith-sandox/libgalera-26.4.9/gcomm/src/gmcast.cpp:connect_precheck():317
2022-01-18 16:13:47 0 [ERROR] WSREP: /bitnami/blacksmith-sandox/libgalera-26.4.9/gcs/src/gcs_core.cpp:gcs_core_open():220: Failed to open backend connection: -131 (State not recoverable)
2022-01-18 16:13:47 0 [ERROR] WSREP: /bitnami/blacksmith-sandox/libgalera-26.4.9/gcs/src/gcs.cpp:gcs_open():1633: Failed to open channel 'galera' at 'gcomm://mariadb-galera-headless.tpe-dev.svc.cluster.local': -131 (State not recoverable)
2022-01-18 16:13:47 0 [ERROR] WSREP: gcs connect failed: State not recoverable
2022-01-18 16:13:47 0 [ERROR] WSREP: wsrep::connect(gcomm://mariadb-galera-headless.tpe-dev.svc.cluster.local) failed: 7
2022-01-18 16:13:47 0 [ERROR] Aborting

I can also reproduce with latest 6.2.0 chart and 10.6.5-debian-10-r35. However the 10.6.4-debian-10-r30 is not affected by the issue. I've patched 10.4.22-debian-10-r20 script in this way to fix the issue:

FROM docker.io/bitnami/mariadb-galera:10.6.4-debian-10-r30 as scripts
FROM docker.io/bitnami/mariadb-galera:10.4.22-debian-10-r20

COPY --from=scripts /opt/bitnami/scripts  /opt/bitnami/scripts

To Reproduce

Steps to reproduce the behavior:

  1. Deploy the chart overriding image tag used by statefulset with 10.4.22-debian-10-r20
  2. k scale statefulset mariadb-galera --replicas=3
  3. k scale statefulset mariadb-galera --replicas=0
  4. See error with k logs mariadb-galera-0 mariadb-galera -f

Expected behavior

I must be able to cold restart a gracefully shudown cluster or re apply initial configuration to a manual repaired cluster

Additional context

carrodher commented 2 years ago

Thanks for reporting this issue, indeed it seems the same issue described at https://github.com/bitnami/charts/issues/8560. We already have a task to investigate this issue, I'll raise its priority

github-actions[bot] commented 2 years ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

rafariossaa commented 2 years ago

We have released a new version of the chart that should fix the issues releated with nodes not being part of the cluster and cluster not bootstraping correctly.

Chart version: 7.0.1 Galera Docker image: 10.6.5-debian-10-r66

Could you give it a try ?

jfillatre commented 2 years ago

Hello, Thanks for the update. Actually I must use the 10.4 Mariadb branch for my product. Is there a plan to backport on it and produce an image?

carrodher commented 2 years ago

In order to use a different branch you need to modify the image tag by using --set image.tag=10.4 or modifying the values.yaml

image:
  tag: 10.4

Here you can see the different branches supported by the bitnami/mariadb-galera container image. Please note the Helm chart is only tested with the branch used by default

jfillatre commented 2 years ago

Yes, my question was only if the latest 10.4 build benefit from the patch (10.4.22-debian-10-r81 this morning). According to a quick test, it seems to not be the case....

rafariossaa commented 2 years ago

Hi, 10.4 also got the changes in the logic needed to work with the chart 7.0.1 version. Just to double check, could you check you are using that combination (chart 7.0.1 and container 10.4.22-debian-10-r81) by using kubectl describe pod xxxx ?

jfillatre commented 2 years ago

Same issue with last chart version:

$ kubectl describe pod mariadb-galera-0
Name:                 mariadb-galera-0
Namespace:            devops-tpe-ocp-02
Priority:             999999
Priority Class Name:  thingpark-enterprise-data
Node:                 aks-default-15968217-vmss000001/10.240.0.5
Start Time:           Tue, 08 Feb 2022 11:07:18 +0100
Labels:               app.kubernetes.io/instance=tpe-data
                      app.kubernetes.io/name=mariadb-galera
                      controller-revision-hash=mariadb-galera-fb7ddf8d8
                      statefulset.kubernetes.io/pod-name=mariadb-galera-0
Annotations:          cni.projectcalico.org/containerID: 3daa3b6a88d1205531fc81c407311985f3266e836dab56b10be2a4cad76bc69b
                      cni.projectcalico.org/podIP: 172.20.2.184/32
                      cni.projectcalico.org/podIPs: 172.20.2.184/32
Status:               Running
IP:                   172.20.2.184
IPs:
  IP:           172.20.2.184
Controlled By:  StatefulSet/mariadb-galera
Containers:
  mariadb-galera:
    Container ID:  containerd://e023b7a0380c15261beb9bd6ab5cc853482c642a15494f4623c902b2c2daaed9
    Image:         docker.io/bitnami/mariadb-galera:10.4.22-debian-10-r81
    Image ID:      docker.io/bitnami/mariadb-galera@sha256:cdbdbf8ad7d301f079ba0c2c6355ef5401eec7f60a1c953dbf29f1b73bdd183d
    Ports:         3306/TCP, 4567/TCP, 4568/TCP, 4444/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      bash
      -ec
      exec /opt/bitnami/scripts/mariadb-galera/entrypoint.sh /opt/bitnami/scripts/mariadb-galera/run.sh

    State:          Running
      Started:      Tue, 08 Feb 2022 11:16:54 +0100
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 08 Feb 2022 11:13:31 +0100
      Finished:     Tue, 08 Feb 2022 11:14:04 +0100
    Ready:          False
    Restart Count:  6
    Limits:
      memory:  700Mi
    Requests:
      memory:  700Mi
    Liveness:  exec [bash -ec password_aux="${MARIADB_ROOT_PASSWORD:-}"
if [[ -f "${MARIADB_ROOT_PASSWORD_FILE:-}" ]]; then
    password_aux=$(cat "$MARIADB_ROOT_PASSWORD_FILE")
fi
exec mysql -u"${MARIADB_ROOT_USER}" -p"${password_aux}" -e "select * from mysql.wsrep_cluster_members;"
] delay=120s timeout=1s period=10s #success=1 #failure=3
    Readiness:  exec [bash -ec password_aux="${MARIADB_ROOT_PASSWORD:-}"
if [[ -f "${MARIADB_ROOT_PASSWORD_FILE:-}" ]]; then
    password_aux=$(cat "$MARIADB_ROOT_PASSWORD_FILE")
fi
exec mysqladmin status -u"${MARIADB_ROOT_USER}" -p"${password_aux}"
] delay=30s timeout=1s period=10s #success=1 #failure=3
    Environment:
      MY_POD_NAME:                          mariadb-galera-0 (v1:metadata.name)
      BITNAMI_DEBUG:                        false
      MARIADB_DEFAULT_PORT_NUMBER:          3306
      MARIADB_GALERA_CLUSTER_NAME:          galera
      MARIADB_GALERA_CLUSTER_ADDRESS:       gcomm://mariadb-galera-0.mariadb-galera-headless.devops-tpe-ocp-02.svc.cluster.local,mariadb-galera-1.mariadb-galera-headless.devops-tpe-ocp-02.svc.cluster.local,mariadb-galera-2.mariadb-galera-headless.devops-tpe-ocp-02.svc.cluster.local
      MARIADB_ROOT_USER:                    root
      MARIADB_ROOT_PASSWORD:                <set to the key 'mariadb-root-password' in secret 'mariadb-galera'>  Optional: false
      MARIADB_DATABASE:                     my_database
      MARIADB_GALERA_MARIABACKUP_USER:      mariabackup
      MARIADB_GALERA_MARIABACKUP_PASSWORD:  <set to the key 'mariadb-galera-mariabackup-password' in secret 'mariadb-galera'>  Optional: false
      MARIADB_ENABLE_LDAP:                  no
      MARIADB_ENABLE_TLS:                   no
    Mounts:
      /bitnami/conf/my.cnf from mariadb-galera-config (rw,path="my.cnf")
      /bitnami/mariadb from data (rw)
      /opt/bitnami/mariadb/.bootstrap from previous-boot (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-z76rj (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-mariadb-galera-0
    ReadOnly:   false
  previous-boot:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  mariadb-galera-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      mariadb-galera-configuration
    Optional:  false
  kube-api-access-z76rj:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              thingpark.enterprise.actility.com/nodegroup-name=tpe
Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                    From                     Message
  ----     ------                  ----                   ----                     -------
  Normal   Scheduled               9m57s                  default-scheduler        Successfully assigned devops-tpe-ocp-02/mariadb-galera-0 to aks-default-15968217-vmss000001
  Normal   SuccessfulAttachVolume  9m47s                  attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-9212bdda-44be-4d8a-a54b-c8d6bcc6ed10"
  Normal   Started                 7m10s (x4 over 9m40s)  kubelet                  Started container mariadb-galera
  Warning  Unhealthy               6m38s (x2 over 9m8s)   kubelet                  Readiness probe failed: mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local MySQL server through socket '/opt/bitnami/mariadb/tmp/mysql.sock' (2)'
Check that mysqld is running and that the socket: '/opt/bitnami/mariadb/tmp/mysql.sock' exists!
  Normal   Pulled   5m49s (x5 over 9m40s)   kubelet  Container image "docker.io/bitnami/mariadb-galera:10.4.22-debian-10-r81" already present on machine
  Normal   Created  5m49s (x5 over 9m40s)   kubelet  Created container mariadb-galera
  Warning  BackOff  4m40s (x13 over 8m32s)  kubelet  Back-off restarting failed container
jk-pru commented 2 years ago

I'll repeat my post from the other thread, since it's more relevant here:

"I believe the issue is in the image, not the config files.

When I set replicas to 0, edit the statefulset to use 10.6.4-debian-10-r31 and set replicas to 3, everything restarts correctly, every single time I do this. Setting replicas to 0, changing the image to 10.6.4-debian-10-r32 (or newer) and setting replicas to 3 allows the cluster to boot only once. Any subsequent scale 0 scale 3 action results in the same error as the OP. Also once the pods boot with 10.6.4-debian-10-r32 setting replicas: 0 results in the last terminating pod writing "seqno: -1" to its grastate.dat file. This does not happen, when using the r31 image.

~~So it may have something to do with this commit? bitnami/bitnami-docker-mariadb-galera@28e8ed0~~"

To give a summary - images released before bitnami/bitnami-docker-mariadb-galera@28e8ed0 (like 10.6.4-debian-10-r31) can recover from scaling replicas to 0 and then back to 3. Any images after that can't.

I installed 7.0.1 with 10.6.4-debian-10-r31 and everything is now working. I can now remove mariadb-0 or run helm update without the entire cluster failing, and I can also scale replicas to 0 then 3.

So please look at bitnami/bitnami-docker-mariadb-galera@28e8ed0, it really seems like the cause here.

dejwsz commented 2 years ago

Try 10.6.4-debian-10-r43 and it will work as I assume and later 10.6.4-debian-10-r44 wil not - maybe this is related also to my comment there: https://github.com/bitnami/charts/issues/8331#issuecomment-1029885465

jk-pru commented 2 years ago

Weird... When I was testing with 6.0.8, 10.6.4-debian-10-r32 seemed to be the first one to fail (I checked r32, r38, and r40). But now on 7.0.1, you're right, 10.6.4-debian-10-r31/r32/r43 work, and 10.6.4-debian-10-r44 fails immediately after scaling.

My bad. So I guess we're looking at https://github.com/bitnami/bitnami-docker-mariadb-galera/commit/2f8f18c703676fb339325fe31579ecf9977b72b5 ?

dejwsz commented 2 years ago

Looks like a good candidate to be blamed for ?

do0ominik commented 2 years ago

Weird... When I was testing with 6.0.8, 10.6.4-debian-10-r32 seemed to be the first one to fail (I checked r32, r38, and r40). But now on 7.0.1, you're right, 10.6.4-debian-10-r31/r32/r43 work, and 10.6.4-debian-10-r44 fails immediately after scaling.

My bad. So I guess we're looking at [bitnami/bitnami-docker-mariadb-galera@2f8f18c](https://github.com/bitnami/bitnami-docker-mariadb-galera/commit/2f8f18c703676fb339325fe31579ecf9977b72b

I think you're right !

Looks like a good candidate to be blamed for ?

I already blamed this change ;)

Have a look at https://github.com/bitnami/bitnami-docker-mariadb-galera/issues/75#issuecomment-1015396951

rafariossaa commented 2 years ago

Yes, in those images that problem exists. You should be using from 10.6.5-debian-10-r66.

I think the "scaling up from 0" issue is due that now you need to indicate explicitly that you are bootstraping the cluster with the galera.bootstrap.forceBootstrap setting.

Appart form that, in general, I would suggest to scale the charts using helm and the replicaCount setting, because there could be some logic in the chart that is avoided when scaling up/down the statefulset, and that logic is usually related to the bootstraping.

jk-pru commented 2 years ago

Sorry, forgot to mention, 10.6.5-debian-10-r66 has this problem as well.

Ok, so essentially with the current configuration we'd need to: 1) scale down the containers (preferably using helm upgrade with replicacount=0, though I don't see anything in the chart that would require it over a simple scale --replicas=0) 2) bring them back using helm upgrade with forcebootstrap=true and replicacount=3 3) wait for the pods to be ready and then recreate them again using helm upgrade with forcebootstrap = false

Step 3 is crucial, because if we leave forcebootstrap=true, a failure of mariadb-0 will force the cluster to split into 2 (just checked, deleted mariadb-0 to simulate a problem with the pod, the new copy created its own new cluster without mariadb-1 and mariadb-2).

Requiring 3 helm installs for a shutdown, especially when a simple scale works on 10.6.4-debian-10-r43 and earlier images, is just not ideal.

dejwsz commented 2 years ago

Maybe that change should be somehow revisited? It would be more ideal to leverage built-in mechanisms from Kubernetes like it was before?

do0ominik commented 2 years ago

Maybe that change should be somehow revisited? It would be more ideal to leverage built-in mechanisms from Kubernetes like it was before?

as @dejwsz said, I would as well suggest to revert this change here:

image

I think that could solve this problem.

As described in this comment: https://github.com/bitnami/bitnami-docker-mariadb-galera/issues/75#issuecomment-1015396951

Cryingmouse commented 2 years ago

@rafariossaa, today I tried the latest Chart 7.0.1. Everything is good for fresh installation. But when I tried to restart all the VMs (on which mariadb galera pods are running) to simulate the power off of the whole data center, the whole cluster cannot recover gracefully. After check the content of the file grastate.dat in each VM, the value of safe_to_bootstrap equals to 1 in one of the nodes. Can it be handled by Chart 7.0.1 automatically, or I need manually do something for that?

rafariossaa commented 2 years ago

Hi, For this situation you need to force bootstrap from one of the nodes.

Cryingmouse commented 2 years ago

Hi, For this situation you need to force bootstrap from one of the nodes.

Here is my environment:

jay@jay-node-01:~$ kubectl get po  -owide
NAME                       READY   STATUS             RESTARTS   AGE     IP            NODE          NOMINATED NODE   READINESS GATES
busybox                    1/1     Running            3772       43d     10.42.2.239   jay-node-03   <none>           <none>
mariadb-mariadb-galera-0   0/1     CrashLoopBackOff   966        3d19h   10.42.1.3     jay-node-02   <none>           <none>
mariadb-mariadb-galera-1   0/1     CrashLoopBackOff   966        3d19h   10.42.2.238   jay-node-03   <none>           <none>
mariadb-mariadb-galera-2   0/1     CrashLoopBackOff   966        3d19h   10.42.0.145   jay-node-01   <none>           <none>

I checked that in mariadb-mariadb-galera-1 is the node I have to force bootstrap. How I can do that? It is appreciated that you can have an instruction on that. Thanks a lot!

rafariossaa commented 2 years ago

Hi, Could you try the instructions provided here ?

Cryingmouse commented 2 years ago

Hi, Could you try the instructions provided here ?

It looks good. In order to orchestrate the logic, I have to build the logic somewhere out of mariadb-galera cluster. Is it possible to handle in mariadb-galera built-in scripts automatically when the cluster restart? :)

rafariossaa commented 2 years ago

Hi, I am not very sure what you mean. What would you need to do ?

Cryingmouse commented 2 years ago

Hi, I am not very sure what you mean. What would you need to do ?

When the cluster restarts, can mariadb-galera pods detect the value of safe_to_bootstrap and then make a decision which pod should start as the bootstrap node automatically? It is be more simple for the user. :)

rafariossaa commented 2 years ago

Hi, I agree, that would be more simple for the user, but for that you will need an operator. It would need to check that file in all the PVCs to decide on which node the cluster should bootstrap, and then, set the parameters to bootstrap the galera cluster.

jk-pru commented 2 years ago

Since being stuck on 10.6.4-debian-10-r43 is not great and the current release where the database that can't handle a restart is not suitable for any important deployment, I've taken the current libmariadbgalera.sh script, edited the lines @Bananenbieger1234 pointed out, and build the image with:

FROM bitnami/mariadb-galera:10.6.7-debian-10-r2 COPY libmariadbgalera.sh /opt/bitnami/scripts/

And it's working as it should. Not to repeat what has already been said, but when we're being recommended workarounds involving forcing bootstrapping or running helm uninstall, I think it bears repeating.

Cryingmouse commented 2 years ago

@rafariossaa, thanks for you quick response. Will it happen to have multiple pods in which safe_to_bootstrap equal to 1?

rafariossaa commented 2 years ago

You should only have one with that state. You can check how it works here.

do0ominik commented 2 years ago

@jk-pru we currently did a fork and reverted the changes as you did. Everything works fine. That's why I created a PR for this. Maybe we can get this to be merged (@rafariossaa)

rafariossaa commented 2 years ago

Hi, could you give it a try to the new chart version (7.1.1)? That is using the new image with the PR from @Bananenbieger1234

cwrau commented 2 years ago

This seems to be working, at least with a couple of tests of mine

The only downside is that the update is not intervention-free, as the serviceName of the StatefulSet changed, which means the user has to delete the StatefulSet before the update

cwrau commented 2 years ago

Nvm, an update just results in the nodes crashing;

mariadb 13:07:38.23 
mariadb 13:07:38.23 Welcome to the Bitnami mariadb-galera container
mariadb 13:07:38.23 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-mariadb-galera
mariadb 13:07:38.24 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-mariadb-galera/issues
mariadb 13:07:38.24 
mariadb 13:07:38.24 INFO  ==> ** Starting MariaDB setup **
mariadb 13:07:38.31 INFO  ==> Validating settings in MYSQL_*/MARIADB_* env vars
mariadb 13:07:42.38 INFO  ==> Initializing mariadb database
mariadb 13:07:42.41 WARN  ==> This node was previouly booted, you may need to force bootstrapping in one of the nodes.
mariadb 13:07:42.43 INFO  ==> Found mounted configuration directory
mariadb 13:07:42.44 INFO  ==> Updating 'my.cnf' with custom configuration
mariadb 13:07:42.45 INFO  ==> Setting user option
mariadb 13:07:42.51 INFO  ==> Setting wsrep_node_name option
mariadb 13:07:42.53 INFO  ==> Setting wsrep_node_address option
mariadb 13:07:46.63 INFO  ==> Setting wsrep_sst_auth option
mariadb 13:07:46.64 INFO  ==> Persisted data detected. Restoring
mariadb 13:07:46.66 INFO  ==> ** MariaDB setup finished! **

mariadb 13:07:46.73 INFO  ==> ** Starting MariaDB **
mariadb 13:07:46.74 INFO  ==> Setting previous boot
2022-03-09 13:07:46 0 [Note] /opt/bitnami/mariadb/sbin/mysqld (server 10.6.7-MariaDB-log) starting as process 1 ...
2022-03-09 13:07:46 0 [Note] WSREP: Loading provider /opt/bitnami/mariadb/lib/libgalera_smm.so initial position: 00000000-0000-0000-0000-000000000000:-1
2022-03-09 13:07:46 0 [Note] WSREP: wsrep_load(): loading provider library '/opt/bitnami/mariadb/lib/libgalera_smm.so'
2022-03-09 13:07:46 0 [Note] WSREP: wsrep_load(): Galera 4.11(r7b59af7) by Codership Oy <info@codership.com> loaded successfully.
2022-03-09 13:07:46 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2022-03-09 13:07:46 0 [Note] WSREP: Found saved state: 2ef8d5c5-9fa8-11ec-a43b-e65cfc2b2a01:-1, safe_to_bootstrap: 0
2022-03-09 13:07:46 0 [Note] WSREP: GCache DEBUG: opened preamble:
Version: 2
UUID: 2ef8d5c5-9fa8-11ec-a43b-e65cfc2b2a01
Seqno: 143 - 372
Offset: 1520
Synced: 1
2022-03-09 13:07:46 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: 2ef8d5c5-9fa8-11ec-a43b-e65cfc2b2a01, offset: 1520
2022-03-09 13:07:46 0 [Note] WSREP: GCache::RingBuffer initial scan...  0.0% (        0/134217752 bytes) complete.
2022-03-09 13:07:46 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (134217752/134217752 bytes) complete.
2022-03-09 13:07:46 0 [Note] WSREP: Recovering GCache ring buffer: found gapless sequence 143-372
2022-03-09 13:07:46 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...  0.0% (     0/216816 bytes) complete.
2022-03-09 13:07:46 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...100.0% (216816/216816 bytes) complete.
2022-03-09 13:07:46 0 [Note] WSREP: Recovering GCache ring buffer: found 2/232 locked buffers
2022-03-09 13:07:46 0 [Note] WSREP: Recovering GCache ring buffer: free space: 134001160/134217728
2022-03-09 13:07:46 0 [Warning] WSREP: Option 'gcs.fc_master_slave' is deprecated and will be removed in the future versions, please use 'gcs.fc_single_primary' instead. 
2022-03-09 13:07:46 0 [Note] WSREP: Passing config to GCS: base_dir = /bitnami/mariadb/data/; base_host = 10.0.3.147; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /bitnami/mariadb/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.fc_single_primary = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc
2022-03-09 13:07:46 0 [Note] WSREP: Start replication
2022-03-09 13:07:46 0 [Note] WSREP: Connecting with bootstrap option: 1
2022-03-09 13:07:46 0 [Note] WSREP: Setting GCS initial position to 00000000-0000-0000-0000-000000000000:-1
2022-03-09 13:07:46 0 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .
2022-03-09 13:07:46 0 [ERROR] WSREP: wsrep::connect(gcomm://) failed: 7
2022-03-09 13:07:46 0 [ERROR] Aborting

Except for node-0, which starts successfully

Also, I'm not entirely sure how gcomm://test-4ap-cra-mariadb-0.test-4ap-cra-mariadb-headless.test-4ap.svc.cluster.local,test-4ap-cra-mariadb-1.test-4ap-cra-mariadb-headless.test-4ap.svc.cluster.local is supposed to work?

I've never seen this usage of a service name, also they don't resolve in my cluster, which might be the cause of the above crash

do0ominik commented 2 years ago

@cwrau what do you mean with serviceName ?? Can you explain what you did during upgrading your cluster (helm upgrade?)

It seems that you are bootstraping from the wrong node. Or you need to force bootstrap from node 0 if you don't mind about the state of your database...

cwrau commented 2 years ago

@cwrau what do you mean with serviceName ??

In the statefulset, you used to use the headless service for cluster communication; gcomm://test-4ap-cra-mariadb-headless.test-4ap.svc.cluster.local

In the new version you use some kind of subdomains of the headless service; gcomm://test-4ap-cra-mariadb-0.test-4ap-cra-mariadb-headless.test-4ap.svc.cluster.local,test-4ap-cra-mariadb-1.test-4ap-cra-mariadb-headless.test-4ap.svc.cluster.local.

I've never seen that, which could just be my lack of knowledge, but these addresses also don't resolve to anything, which explains why the node fails

Can you explain what you did during upgrading your cluster (helm upgrade?)

I just deleted the old StatefulSet with --cascade=orphan, because there are changes on unmodifiable fields in the new StatefulSet, and then ran helm upgrade --reuse-values --version 7.1.1

It seems that you are bootstraping from the wrong node. Or you need to force bootstrap from node 0 if you don't mind about the state of your database...

Yes and no, node-1 cannot start on its own because it wasn't the last to leave the cluster. But node-1 also cannot reach node-0 as seen in the log, which is still running successfully.

cwrau commented 2 years ago

These are my values;

replicaCount: 3
podDisruptionBudget:
  create: true
  minAvailable: 2
podAntiAffinityPreset: hard
extraEnvVars:
  - name: MARIADB_EXTRA_FLAGS
    value: --skip-log-bin
metrics:
  enabled: true
  serviceMonitor:
    enabled: true
    selector: false
  resources:
    requests:
      memory: 32Mi
      cpu: 10m
    limits:
      memory: 32Mi
      cpu: 250m
podSecurityContext:
  enabled: true
containerSecurityContext:
  enabled: true
  allowPrivilegeEscalation: false
  capabilities:
    drop:
      - ALL
  privileged: false
  runAsNonRoot: true
serviceAccount:
  create: false
rbac:
  create: false
priorityClassName: low
rootUser:
  password: toor
db:
  password: toor
galera:
  mariabackup:
    password: toor
persistence:
  enabled: true
  storageClass: longhorn
  size: 8Gi
resources:
  limits:
    memory: 2Gi
    cpu: 1
  requests:
    memory: 1Gi
    cpu: 100m
do0ominik commented 2 years ago

Maybe you can explicitly override your "cluster address" pointing to the headless service.

Like this: MARIADB_GALERA_CLUSTER_ADDRESS: gcomm://test-4ap-cra-mariadb-headless.test-4ap.svc.cluster.local

I don't know why this behavior has changed some weeks ago and I don't know if this is the reason for your errors ... image

cwrau commented 2 years ago

That seems to work, so that should probably be reverted then

Maybe @rafariossaa can tell us why that change was made and how it was supposed to work? Cause

I've never seen that, which could just be my lack of knowledge, but these addresses also don't resolve to anything, which explains why the node fails

rafariossaa commented 2 years ago

The reason is that you need to get all the nodes names (IPs) explicitly, so one node is able to find the rest of them. In the previous approach you are only getting one of the IPs for the service. We needed to change the service type, so now it is possible to get the nodes at that headless service.

cwrau commented 2 years ago

But the headless service returns all pods' IPs, that's how it worked before

Or is mariadb only using the first IP? But then how did it work before? 😅

In any case, the new approach doesn't work either, as these DNS records don't exist

jk-pru commented 2 years ago

Yep, everything seems to be working now. Every single restart I did was handled correctly, and there's no regression with regards to deleting selected pods or helm upgrade.

Good work!

jk-pru commented 2 years ago

@cwrau was the version of the chart you were using before the upgrade below 7.0.1? Before that update the cluster had a tendency to split into 2. So it’s possible that mariadb-1 is in a separate cluster from mariadb-0 and won’t be able to join, because it’s completely out of sync. Check in the logs of both nodes if they have the same UUID. If not – make sure mariadb-0 has all the data you need, scale replicas to 1 and delete pvc-1 and pvc-2, then go back to 3 replicas.

The new addresses in gcomm might look weird, but they should work.

cwrau commented 2 years ago

Yes, since the chart is "broken" since 6.0.4, we've been staying on 6.0.3 waiting for the fix, aka this ticket here 😅

I'm gonna try that, but can we get around this manual intervention? This would mean that I'd have to migrate all 77 mariadb clusters by hand 🙈

rafariossaa commented 2 years ago

@cwrau as @jk-pru metioned, the issues related to cluster split were fixed in 7.0.1.

Regarding the serviceName change, it was changed to follow the same approach we have in the rest of the charts. This is to get the names/IPs of the nodes and be able to configure galera with the rest of the name nodes. In the previous approach it was using always the service one, that is not the best approach.

cwrau commented 2 years ago

Ok, if we can't get around the deletion of the StatefulSet, I at least found a nicer workaround than the one above;

  1. Set MARIADB_GALERA_CLUSTER_ADDRESS to the old value, as @Bananenbieger1234 suggested

    extraEnvVars:
      - name: MARIADB_GALERA_CLUSTER_ADDRESS
        value: 'gcomm://{{ template "common.names.fullname" . }}-headless.{{ .Release.Namespace }}.svc.{{ .Values.clusterDomain }}'
  2. Delete the StatefulSet with cascade=orphan

  3. Apply upgrade (the newer nodes join the existing cluster)

  4. Wait until all nodes are recycled

  5. Remove envVar from step 1

  6. Reupgrade (as the serviceName has been changed now for all pods, the new address now works)

    • I had to re-deploy the ($replicaCount -1) pod after the upgrade, for some reason the pod didn't get any MARIADB_GALERA_CLUSTER_ADDRESS, but after the deletion of the pod it worked fine 🤷

This is still quite manual, and I don't look forward to upgrading all of our 77 clusters, but at least it works without downtime and without the deletion of some PVCs

rafariossaa commented 2 years ago

Hi, Thanks for providing a workaround for that case.

github-actions[bot] commented 2 years ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] commented 2 years ago

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

cwrau commented 2 years ago

This issue seems to still be relevant

We had a cluster crashing and now we can't get it back up running again, setting safe_to_bootstrap: 1 doesn't work anymore as it did in 6.x.x

2022-05-03  8:49:32 0 [Note] /opt/bitnami/mariadb/sbin/mysqld (server 10.6.7-MariaDB) starting as process 1 ...
2022-05-03  8:49:32 0 [Note] WSREP: Loading provider /opt/bitnami/mariadb/lib/libgalera_smm.so initial position: 00000000-0000-0000-0000-000000000000:-1
2022-05-03  8:49:32 0 [Note] WSREP: wsrep_load(): loading provider library '/opt/bitnami/mariadb/lib/libgalera_smm.so'
2022-05-03  8:49:32 0 [Note] WSREP: wsrep_load(): Galera 4.11(r7b59af7) by Codership Oy <info@codership.com> loaded successfully.
2022-05-03  8:49:32 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2022-05-03  8:49:32 0 [Note] WSREP: Found saved state: fd477e0b-caae-11ec-947e-5f1468ac0a21:-1, safe_to_bootstrap: 1
2022-05-03  8:49:32 0 [Note] WSREP: GCache DEBUG: opened preamble:
Version: 2
UUID: fd477e0b-caae-11ec-947e-5f1468ac0a21
Seqno: 16 - 752
Offset: 1800
Synced: 1
2022-05-03  8:49:32 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: fd477e0b-caae-11ec-947e-5f1468ac0a21, offset: 1800
2022-05-03  8:49:32 0 [Note] WSREP: GCache::RingBuffer initial scan...  0.0% (        0/134217752 bytes) complete.
2022-05-03  8:49:32 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (134217752/134217752 bytes) complete.
2022-05-03  8:49:32 0 [Note] WSREP: Recovering GCache ring buffer: found gapless sequence 16-752
2022-05-03  8:49:32 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...  0.0% (     0/540080 bytes) complete.
2022-05-03  8:49:32 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...100.0% (540080/540080 bytes) complete.
2022-05-03  8:49:32 0 [Note] WSREP: Recovering GCache ring buffer: found 2/739 locked buffers
2022-05-03  8:49:32 0 [Note] WSREP: Recovering GCache ring buffer: free space: 133677904/134217728
2022-05-03  8:49:32 0 [Warning] WSREP: Option 'gcs.fc_master_slave' is deprecated and will be removed in the future versions, please use 'gcs.fc_single_primary' instead.
2022-05-03  8:49:32 0 [Note] WSREP: Passing config to GCS: base_dir = /bitnami/mariadb/data/; base_host = 192.168.15.233; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /bitnami/mariadb/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.fc_single_primary = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0
2022-05-03  8:49:32 0 [Note] WSREP: Start replication
2022-05-03  8:49:32 0 [Note] WSREP: Connecting with bootstrap option: 0
2022-05-03  8:49:32 0 [Note] WSREP: Setting GCS initial position to 00000000-0000-0000-0000-000000000000:-1
2022-05-03  8:49:32 0 [Note] WSREP: protonet asio version 0
2022-05-03  8:49:32 0 [Note] WSREP: Using CRC-32C for message checksums.
2022-05-03  8:49:32 0 [Note] WSREP: backend: asio
2022-05-03  8:49:32 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2022-05-03  8:49:32 0 [Warning] WSREP: access file(/bitnami/mariadb/data//gvwstate.dat) failed(No such file or directory)
2022-05-03  8:49:32 0 [Note] WSREP: restore PC from disk failed
2022-05-03  8:49:32 0 [Note] WSREP: GMCast version 0
2022-05-03  8:49:32 0 [Note] WSREP: (f3c3b59c-aad1, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2022-05-03  8:49:32 0 [Note] WSREP: (f3c3b59c-aad1, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2022-05-03  8:49:32 0 [Note] WSREP: EVS version 1
2022-05-03  8:49:32 0 [Note] WSREP: gcomm: connecting to group 'galera', peer 'customer-mariadb-0.customer-mariadb-headless.customer.svc.cluster.local:,customer-mariadb-1.customer-mariadb-headless.customer.svc.cluster.local:,customer-mariadb-2.customer-mariadb-headless.customer.svc.cluster.local:'
2022-05-03  8:49:32 0 [Note] WSREP: (f3c3b59c-aad1, 'tcp://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address tcp://192.168.15.233:4567
2022-05-03  8:49:35 0 [Note] WSREP: EVS version upgrade 0 -> 1
2022-05-03  8:49:35 0 [Note] WSREP: PC protocol upgrade 0 -> 1
2022-05-03  8:49:35 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2022-05-03  8:49:35 0 [Note] WSREP: view(view_id(NON_PRIM,f3c3b59c-aad1,1) memb {
    f3c3b59c-aad1,0
} joined {
} left {
} partitioned {
})
2022-05-03  8:49:36 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50327S), skipping check
2022-05-03  8:50:05 0 [Note] WSREP: PC protocol downgrade 1 -> 0
2022-05-03  8:50:05 0 [Note] WSREP: view((empty))
2022-05-03  8:50:05 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
     at /bitnami/blacksmith-sandox/libgalera-26.4.11/gcomm/src/pc.cpp:connect():160
2022-05-03  8:50:05 0 [ERROR] WSREP: /bitnami/blacksmith-sandox/libgalera-26.4.11/gcs/src/gcs_core.cpp:gcs_core_open():220: Failed to open backend connection: -110 (Connection timed out)
2022-05-03  8:50:05 0 [ERROR] WSREP: /bitnami/blacksmith-sandox/libgalera-26.4.11/gcs/src/gcs.cpp:gcs_open():1664: Failed to open channel 'galera' at 'gcomm://customer-mariadb-0.customer-mariadb-headless.customer.svc.cluster.local,customer-mariadb-1.customer-mariadb-headless.customer.svc.cluster.local,customer-mariadb-2.customer-mariadb-headless.customer.svc.cluster.local': -110 (Connection timed out)
2022-05-03  8:50:05 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2022-05-03  8:50:05 0 [ERROR] WSREP: wsrep::connect(gcomm://customer-mariadb-0.customer-mariadb-headless.customer.svc.cluster.local,customer-mariadb-1.customer-mariadb-headless.customer.svc.cluster.local,customer-mariadb-2.customer-mariadb-headless.customer.svc.cluster.local) failed: 7
2022-05-03  8:50:05 0 [ERROR] Aborting
rafariossaa commented 2 years ago

Hi @cwrau , Do you mind opening a new issue ? That way we can properly handle your issue. Thanks forehand.

helletheone commented 2 years ago

@cwrau is there a solutions for your problem? because i have the same problem now