bitnami / charts

Bitnami Helm Charts
https://bitnami.com
Other
8.87k stars 9.16k forks source link

Redis - How to debug: Master Replica Sync - Error condition on socket for SYNC: Connection refused #29437

Open MadsGosvig opened 1 week ago

MadsGosvig commented 1 week ago

Name and Version

bitnami/redis 20.1.0

What architecture are you using?

amd64

What steps will reproduce the bug?

Environment

We are running the Redis containers inside an Azure Kubernetes Cluster. The nodes are running Linux with the following image: AKSUbuntu-2204gen2containerd-202408.27.0

We have observed the issue on the following chart versions:

We have been running with the charts for a few years, and never seen the issue described below.

Config

We are running the default Helm Chart configurations, with the following overrides:

master:
  podAnnotations: {ad.datadoghq.com/redis.logs: '[{"source":"redis","service":"redis"}]', ad.datadoghq.com/redis.check_names: '["redisdb"]', ad.datadoghq.com/redis.init_configs: '[{}]', ad.datadoghq.com/redis.instances: '[{"host": "%%host%%","port":"6379","password":"%%env_REDIS_PASSWORD%%"}]'}
replica:
  replicaCount: 3
  podAnnotations: {ad.datadoghq.com/redis.logs: '[{"source":"redis","service":"redis"}]', ad.datadoghq.com/redis.check_names: '["redisdb"]', ad.datadoghq.com/redis.init_configs: '[{}]', ad.datadoghq.com/redis.instances: '[{"host": "%%host%%","port":"6379","password":"%%env_REDIS_PASSWORD%%"}]'}
sentinel:
  enabled: true

Error

We observed that the Mater - Replica sync didn't work in the following time periods on the 13th of September

The logs that started it all:

'Connection with master lost.'
...Follow by
Connecting to MASTER redis-node-2.redis-headless.<namespace>.svc.cluster.local:6379
MASTER <-> REPLICA sync started
Error condition on socket for SYNC: Connection refused

We also some indication of partial syncs:

MASTER <-> REPLICA sync started
Non blocking connect for SYNC fired the event.
Master replied to PING, replication can continue...
Partial resynchronization not possible (no cached master)
Master is current unable to PSYNC but should be in the future: -LOADING Redis is loading the dataset in memory
Connecting to MSATER redis-node-1.redis-headless.<namespace>.svc.cluster.local:6379
MATER <-> REPLICA sync started
Error condition on socket for SYNC: Connection refused

This was followed by the Pods restarting and upon start up they tried to sync again, which kept failing. We didn't touch anything related to Infrastructure or the Kubernetes Deployment itself during the first time period (07:00 - 08:00) but it magically just began working.

What I would like to know is how to debug this kind of error in more detail. Since it starting working on its own, I want to rule out Infrastructure changes, but we would like to know why it happened, and how we can debug and prevent it in the future.

What is the expected behavior?

The pods should start up and sync Master and Replica nodes.

What do you see instead?

The pods start up and synchronization between Master and Replica nodes fails.

carrodher commented 1 week ago

The issue may not be directly related to the Bitnami container image/Helm chart, but rather to how the application is being utilized, configured in your specific environment, or tied to a particular scenario that is not easy to reproduce on our side.

If you think that's not the case and want to contribute a solution, we welcome you to create a pull request. The Bitnami team is excited to review your submission and offer feedback. You can find the contributing guidelines here.

Your contribution will greatly benefit the community. Feel free to reach out if you have any questions or need assistance.

Suppose you have questions about the application, customizing its content, or technology and infrastructure usage. In that case, we highly recommend that you refer to the forums and user guides provided by the project responsible for the application or technology.

With that said, we'll keep this ticket open until the stale bot automatically closes it, in case someone from the community contributes valuable insights.