bitnami / charts

Bitnami Helm Charts
https://bitnami.com
Other
8.92k stars 9.18k forks source link

Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:xxxxxx laddr=127.0.0.1:6379) #29616

Open bustersg opened 2 weeks ago

bustersg commented 2 weeks ago

Name and Version

bitnami/redis-cluster 11.04

What architecture are you using?

amd64

What steps will reproduce the bug?

For Openshift deployment, Update tls section in vanilla 11.03 or 11.04 values.yaml helm install -f values.yaml redis-cluster ./ -n redis-dev

3 masters and 3 slaves (6/6) but some pods logs keep generating Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:xxxxxx laddr=127.0.0.1:6379) every 1 second.

Tried 3 ways but still same outcomes:

  1. openssl method to generate CA Certificate and Key
  2. autoGenerated: true in values.yaml
  3. deploy own cert, key and CA root

All 3 methods got 6 pods up and running but always some pods got the SSL wrong version number every second.

Are you using any custom parameters or values?

tls:
  enabled: true
  autoGenerated: false
  existingSecret: "redis-tls"
  certFilename: "redis.crt"
  certKeyFilename: "redis.key"
  certCAFilename: "ca.crt"
tls:
  enabled: true
  autoGenerated: true
  existingSecret: ""
  certFilename: ""
  certKeyFilename: ""
  certCAFilename: ""

What is the expected behavior?

You should not see this type of error (at least not on autoGenerated: true)

What do you see instead?

1:M 26 Sep 2024 08:36:14.040 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:53466 laddr=127.0.0.1:6379)
1:M 26 Sep 2024 08:36:15.374 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:53472 laddr=127.0.0.1:6379)
1:M 26 Sep 2024 08:36:16.037 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:53484 laddr=127.0.0.1:6379)
1:M 26 Sep 2024 08:36:17.039 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:53498 laddr=127.0.0.1:6379)
1:M 26 Sep 2024 08:36:18.041 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:54166 laddr=127.0.0.1:6379)
1:M 26 Sep 2024 08:36:19.043 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:54192 laddr=127.0.0.1:6379)
...
...

Additional information

I tried adding and playing with these TLS config in values.yaml but it still did not work.

# DIY
  #minVersion: "TLS1.3"  # or "TLS1.3" if supported
  #clientAuthType: "RequireAndVerifyClientCert"  # Options include "RequireAndVerifyClientCert", etc.
  #clientAuthType: "VerifyClientCertIfGiven"
  #preferServerCiphers: no  # Enables the use of server-preferred cipher suites
  #tlsReplication: false     # Enable TLS for replication
  #tlsCluster: false     # Enable TLS for cluster communication
  #tlsProtocols:
    #- "TLS1.2"
    #- "TLS1.3"  # Allow only TLS 1.2 and 1.3
  #ciphersuites:
    #- "TLS_AES_128_GCM_SHA256"  # Example of a TLS 1.3 cipher suite
    #- "TLS_AES_256_GCM_SHA384"  # Another TLS 1.3 cipher suite
    #- "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256"  # Example of a TLS 1.2 cipher suite
    #- "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384"  # Another TLS 1.2 cipher suite
juan131 commented 1 week ago

Hi @bustersg

I was unable to reproduce the issue. These are the steps I followed:

$ helm install redis-cluster oci://registry-1.docker.io/bitnamicharts/redis-cluster --set tls.enabled=true,tls.autoGenerated=true
(...)
CHART NAME: redis-cluster
CHART VERSION: 11.0.5
APP VERSION: 7.4.0
(...)
$ kubectl get sts redis-cluster -o yaml
(...)
        - name: REDIS_TLS_ENABLED
          value: "yes"
        - name: REDIS_TLS_PORT_NUMBER
          value: "6379"
        - name: REDIS_TLS_AUTH_CLIENTS
          value: "yes"
        - name: REDIS_TLS_CERT_FILE
          value: /opt/bitnami/redis/certs/tls.crt
        - name: REDIS_TLS_KEY_FILE
          value: /opt/bitnami/redis/certs/tls.key
        - name: REDIS_TLS_CA_FILE
          value: /opt/bitnami/redis/certs/ca.crt
(...)
        volumeMounts:
        (...)
        - mountPath: /opt/bitnami/redis/certs
          name: redis-certificates
          readOnly: true
(...)
      volumes:
      (...)
      - name: redis-certificates
        secret:
          defaultMode: 256
          secretName: redis-cluster-crt
(...)
$ kubectl get secret redis-cluster-crt -o json | jq .data
{
  "ca.crt": "XXX",
  "tls.crt": "YYY",
  "tls.key": "ZZZ"
}
$ kubectl logs sts/redis-cluster | grep "Error accepting a client connection"
bustersg commented 1 week ago

Let me try your method of installing via oci://registry-1.docker.io/bitnamicharts/redis-cluster For my errors, I actually helm pull repo and updated the values.yaml and helm install -f values.yaml .... Get back within 48 hours.

bustersg commented 1 week ago

Hi @juan131

ok so i started a new project in openshift.

$ helm install redis-cluster oci://registry-1.docker.io/bitnamicharts/redis-cluster --set tls.enabled=true,tls.autoGenerated=true                                                                                                                   Pulled: registry-1.docker.io/bitnamicharts/redis-cluster:11.0.6        
STATUS: deployed
REVISION: 1
CHART NAME: redis-cluster
CHART VERSION: 11.0.6
APP VERSION: 7.4.1** Please be patient while the chart is being deployed **
...
WARNING: There are "resources" sections in the chart not set. Using "resourcesPreset" is not recommended for production. For production installations, please set the following values according to your workload needs: 
- redis.resources
- updateJob.resources
+info https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/   

Looks good and normal but when I check the pod logs (below) - same issues. Note there are no Routes and Ingresses for this new project namespace. Only services are redis-cluster and redis-cluster-headless 1 Network policies - redis-cluster Looks pretty standard and norm since we using default setup.

However, all pods encounter similar errors after 5 minutes from deployment. Some pods generate few lines and stopped while other pods keep generating the error every second (see timestamp)

redis-cluster-4

1:M 03 Oct 2024 01:58:46.032 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 03 Oct 2024 01:58:46.033 * No cluster configuration found, I'm 00921bbe8bde60198exxx361eda4dfe482be
1:M 03 Oct 2024 01:58:46.052 * Server initialized
1:M 03 Oct 2024 01:58:46.054 * Creating AOF base file appendonly.aof.1.base.rdb on server start
1:M 03 Oct 2024 01:58:46.056 * Creating AOF incr file appendonly.aof.1.incr.aof on server start
1:M 03 Oct 2024 01:58:46.056 * **Ready to accept connections tls**
1:M 03 Oct 2024 01:58:52.322 * configEpoch set to 5 via CLUSTER SET-CONFIG-EPOCH
1:S 03 Oct 2024 01:58:56.394 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 03 Oct 2024 01:58:56.394 * Connecting to MASTER 172.17.60.160:6379
1:S 03 Oct 2024 01:58:56.394 * MASTER <-> REPLICA sync started
1:S 03 Oct 2024 01:58:56.394 * **Cluster state changed: ok**
...
...
1:S 03 Oct 2024 01:58:56.553 * Background AOF rewrite terminated with success
1:S 03 Oct 2024 01:58:56.553 * Successfully renamed the temporary AOF base file temp-rewriteaof-bg-217.aof into appendonly.aof.2.base.rdb
1:S 03 Oct 2024 01:58:56.553 * Successfully renamed the temporary AOF incr file temp-appendonly.aof.incr into appendonly.aof.2.incr.aof
1:S 03 Oct 2024 01:58:56.570 * Removing the history file appendonly.aof.1.incr.aof in the background
1:S 03 Oct 2024 01:58:56.570 * Removing the history file appendonly.aof.1.base.rdb in the background
1:S 03 Oct 2024 01:58:56.573 * Background AOF rewrite finished successfully
1:S 03 Oct 2024 01:58:56.707 * Node e8e538xxxfde87af7bbf03ab3b1f7908dc982c () is no longer master of shard a664a89727715899cf6cexxx634aaa344d000; removed all 0 slot(s) it used to own
1:S 03 Oct 2024 01:58:56.707 * Node e8e538b9394fxxxbbf03ab3b1f7908dc982c () is now part of shard c7624586e206c86xxx8ddd3d14d38a58daeec5b8
1:S 03 Oct 2024 01:59:04.537 * Node ccad07033d2b59xxx13111be3928a09e35 () is no longer master of shard 85da675039c36a710f4cfxxx0a923056074e; removed all 0 slot(s) it used to own
1:S 03 Oct 2024 01:59:04.537 * Node ccad0703xxx371a24a9b13111be3928a09e35 () is now part of shard ea58c65a03f650axxx9d031629df6d2e27e917d
1:S 03 Oct 2024 02:03:36.039 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:43738 laddr=127.0.0.1:6379)
1:S 03 Oct 2024 02:03:37.044 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:43760 laddr=127.0.0.1:6379)
1:S 03 Oct 2024 02:03:38.051 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:43784 laddr=127.0.0.1:6379)
1:S 03 Oct 2024 02:03:39.039 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:36050 laddr=127.0.0.1:6379)
1:S 03 Oct 2024 02:03:40.042 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:36088 laddr=127.0.0.1:6379)
1:S 03 Oct 2024 02:03:41.058 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:36108 laddr=127.0.0.1:6379)
1:S 03 Oct 2024 02:03:42.043 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:36138 laddr=127.0.0.1:6379)
1:S 03 Oct 2024 02:03:43.045 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:36162 laddr=127.0.0.1:6379)
1:S 03 Oct 2024 02:03:44.059 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:36186 laddr=127.0.0.1:6379)
1:S 03 Oct 2024 02:03:45.063 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:36220 laddr=127.0.0.1:6379)

$ oc get sts redis-cluster -o yaml - word for word same as your prev comment $ oc get secret redis-cluster-crt -o json | jq .data - yap all ca.crt, tls.crt and tls.key display their encrypted cert onscreen correctly.

juan131 commented 1 week ago

Hi @bustersg

I'm still unable to reproduce it. This time I also used Openshift as the target cluster (OCP 4.13.x - Kubernetes v.1.26.x, to be more precise) without reproducing it.

Do you have any client attempting to connect to your Redis Cluster pods or does the error appears without any interaction and the logs are simply the result of the readiness/liveness probes?

bustersg commented 1 week ago

I'm openshift 4.12.x

any clients? no, is a new namespace, only redis-cluster. no ingress, no routes and only redis services with pvc volume

error logs appear without intreraction.

result of readiness/liveness? $ helm install redis-cluster oci://registry-1.docker.io/bitnamicharts/redis-cluster --set tls.enabled=true,tls.autoGenerated=true,tls.authClients=true,persistence.enabled=true,persistence.storageClass=ocs-storagecluster-ceph-rbd,metrics.enabled=false,diagnosticMode.enabled=false,redis.livenessProbe.enabled=false,redis.readinessProbe.enabled=false.startupProbe.enabled=false

well I tried the above and after 5 hours, the logs are still rotating, 10-20 thousands lines of WRONG VERSION log on some pods.

bustersg commented 1 week ago

lets just say some unknown is trying to communicate to laddr=127.0.0.1:6379 (port 6379 which is a well known Redis port) possible if I try to change 6379 to something else? hmm, will try and get back.

juan131 commented 6 days ago

It's so weird.. Maybe it's some compilation issue related with the libcurl or libssl version, see:

Could you please give a try with latest chart version? We did a release bumping redis-cluster version to 7.4.1-debian-12-r0 4 days ago, see:

bustersg commented 6 days ago

Same error with latest chart version.

$ helm repo update
$ helm pull bitnami/redis-cluster

$ helm install redis-cluster oci://registry-1.docker.io/bitnamicharts/redis-cluster --set tls.enabled=true,tls.autoGenerated=true,tls.authClients=true,persistence.enabled=true,metrics.enabled=false,diagnosticMode.enabled=false                                                                                                                               
Pulled: registry-1.docker.io/bitnamicharts/redis-cluster:11.0.6
Digest: sha256:2f8a748f8bcbb01886de4c1683ba2f8cd961752b7de73d7de4dae5a7775c531f
NAME: redis-cluster
LAST DEPLOYED: Mon Oct  7 14:48:21 2024
NAMESPACE: redis-dev
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
CHART NAME: redis-cluster
CHART VERSION: 11.0.6
VERSION: 7.4.1** Please be patient while the chart is being deployed **

1 of the pod cluster-0 log. I tried to delete cluster-0 and these errors still pop up after the pod restarts itself.

1:M 07 Oct 2024 06:50:14.235 * Opening AOF incr file appendonly.aof.1.incr.aof on server start
1:M 07 Oct 2024 06:50:14.235 * Ready to accept connections tls
1:M 07 Oct 2024 06:50:14.432 * Replica 172.17.10.170:6379 asks for synchronization
1:M 07 Oct 2024 06:50:14.432 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '6ed32c0666a8f3a95c6211cba9dd54e2710ca183', my replication IDs are 'da3b9d60550de423af7e4dd36a01347f5f159b05' and '0000000000000000000000000000000000000000')
1:M 07 Oct 2024 06:50:14.432 * Replication backlog created, my new replication IDs are '9d2980f39f83b864646d0644485d3eda85ccf630' and '0000000000000000000000000000000000000000'
1:M 07 Oct 2024 06:50:14.432 * Starting BGSAVE for SYNC with target: disk
1:M 07 Oct 2024 06:50:14.432 * Background saving started by pid 148
148:C 07 Oct 2024 06:50:14.474 * DB saved on disk
148:C 07 Oct 2024 06:50:14.475 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
1:M 07 Oct 2024 06:50:14.477 * Background saving terminated with success
1:M 07 Oct 2024 06:50:14.478 * Synchronization with replica 172.17.10.170:6379 succeeded
1:M 07 Oct 2024 06:50:16.288 * Cluster state changed: ok
1:M 07 Oct 2024 06:50:21.123 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:43328 laddr=127.0.0.1:6379)
1:M 07 Oct 2024 06:50:22.043 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:43334 laddr=127.0.0.1:6379)
1:M 07 Oct 2024 06:50:23.071 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:43342 laddr=127.0.0.1:6379)
1:M 07 Oct 2024 06:50:24.828 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:43364 laddr=127.0.0.1:6379)
1:M 07 Oct 2024 06:50:25.333 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:43366 laddr=127.0.0.1:6379)
1:M 07 Oct 2024 06:50:26.049 # Error accepting a client connection: error:0A00010B:SSL routines::wrong version number (addr=127.0.0.1:43376 laddr=127.0.0.1:6379)
...
...
juan131 commented 5 days ago

Hi @bustersg

I'm pretty sure there must be a Redis client attempting to connect to your Redis server without using the corresponding TLS flags. Could you please try reproducing the issue on a different cluster? You may have some ServiceMesh or similar solution in this particular cluster that automatically perform some kind of health check based on the pods spec.

bustersg commented 4 days ago

I have a little hunch on that too but there is literally no routes, ingress, stateful, deployment, other pods or services in tha namespace. If there is such, it means it could be coming outside the namespace to probe the 6379 port. I'm now trying the redis v7.0.15 from https://quay.io/repository/opstree/redis?tab=tags through the operatorHub. got it up and running without TLS and gona try TLS=enabled next.