bitnami / charts

Bitnami Helm Charts
https://bitnami.com
Other
9k stars 9.22k forks source link

[bitnami/mariadb-galera] helm upgrade with "tls.enabled=true,tls.autoGenerated=true" makes existing galera nodes fail to communicate #15525

Open ledroide opened 1 year ago

ledroide commented 1 year ago

Name and Version

bitnami/mariadb-galera 7.5.3

What steps will reproduce the bug?

Configuration :

tls:
  enabled: true
  autoGenerated: true

How to reproduce :

A diff before upgrading shows that tls certificates are renewed even if existing already :

--- /tmp/LIVE-2635943621/v1.Secret.galera.database-crt  2023-03-16 14:24:20.740023113 +0100
+++ /tmp/MERGED-1283566051/v1.Secret.galera.database-crt        2023-03-16 14:24:20.740023113 +0100
@@ -1,8 +1,8 @@
 apiVersion: v1
 kind: Secret
 data:
-  ca.crt: '*** (before)'
-  tls.crt: '*** (before)'
-  tls.key: '*** (before)'
+  ca.crt: '*** (after)'
+  tls.crt: '*** (after)'
+  tls.key: '*** (after)'

Problem : starting members are unable to communicate with other members, then raises for new pods an Error, then CrashLoopBackOff.

Here is what I can see for any starting pod in the logs when the StatefulSet is restarting the cluster pods :

2023-03-16 09:47:56
mariadb 09:47:56.83 DEBUG ==> Setting wsrep_provider_options to ''socket.ssl_cert=/bitnami/mariadb/certs/tls.crt;socket.ssl_key=/bitnami/mariadb/certs/tls.key;socket.ssl_ca=/bitnami/mariadb/certs/ca.crt'' in mariadb configuration file /opt/bitnami/mariadb/conf/my.cnf
2023-03-16 09:47:56
mariadb 09:47:56.82 DEBUG ==> Setting ssl_key to '/bitnami/mariadb/certs/tls.key' in mariadb configuration file /opt/bitnami/mariadb/conf/my.cnf
2023-03-16 09:47:56
mariadb 09:47:56.81 DEBUG ==> Setting ssl_cert to '/bitnami/mariadb/certs/tls.crt' in mariadb configuration file /opt/bitnami/mariadb/conf/my.cnf
2023-03-16 09:42:51
2023-03-16  9:42:51 0 [Warning] WSREP: Handshake failed: tlsv1 alert unknown ca
2023-03-16 09:42:51
2023-03-16  9:42:51 0 [Warning] WSREP: Handshake failed: tlsv1 alert unknown ca
2023-03-16 09:42:50
2023-03-16  9:42:50 0 [Warning] WSREP: Handshake failed: tlsv1 alert unknown ca
2023-03-16 09:42:50
2023-03-16  9:42:50 0 [Warning] WSREP: Handshake failed: tlsv1 alert unknown ca

That means that I need two different values files - one with autoGenerated=true, one with autoGenerated=false" - depending on an existing cluster or not -> so this is not immutable, not even idempotent.

The only workaround that I have found is to scale members to 0 and then scale up - and this causes downtime, unfortunately.

Is there an option I missed that would manage this case - and not replace existing certificates, but only generate them if they do not exist?

What architecture are you using?

amd64 Kubernetes 1.26.2 helm 3.11.1

Issues seen before

Maybe related to #7071 or #8424 issues

aoterolorenzo commented 1 year ago

Hey @ledroide,

How about using

 tls:
  enabled: true
  autoGenerated: true

at the installation, and:

 tls:
  enabled: true
  autoGenerated: false

for the upgrades?

It is true that it doesn't seem very fancy to regenerate the existing certs at the upgrades, but I'm not sure if this is a bug or a conception issue.

ledroide commented 1 year ago

@aoterolorenzo : That's why I was writing :

(this values.yaml configuration) is not immutable, not even idempotent

If we apply twice the same values.yaml, the first run works, the second run crashes the whole.

From the ops or c-i point of vue, it's clearly a bug regarding a common use case. There should be some check point before replacing existing certificates - automatically or triggered following a boolean variable, let's say tls.replaceTlsCertsIfExist: false

aoterolorenzo commented 1 year ago

Yep, completely agree! Let me create an internal task for the team to take a deeper look and address the issue. We will reach you back here as soon as our workload allow us to work on it (no ETA could be provided I'm afraid).

Thanks for reporting!