DandyDeveloper / charts

Various helm charts migrated from [helm/stable] due to deprecation
https://dandydeveloper.github.io/charts
Apache License 2.0
157 stars 146 forks source link

redis-ha-4.27.0 - split brain #283

Open Pride1st1 opened 5 months ago

Pride1st1 commented 5 months ago

Describe the bug I deployed the chart with default values. During its explatation we met condition when redis-0 and redis-2 are replicas of redis-1, and redis-1 is replica of redis-0. The split-brain-fix container wasn`t able to fix the problem.

172.20.75.109 - redis-0 172.20.181.236 - redis-1 172.20.198.17 - redis-2

redis-0:

  |   | 2024-06-18 18:23:36.849 | 1:S 18 Jun 2024 15:23:36.849 * Connecting to MASTER 172.20.181.236:6379 |  
  |   | 2024-06-18 18:23:36.849 | 1:S 18 Jun 2024 15:23:36.849 * MASTER <-> REPLICA sync started |  
  |   | 2024-06-18 18:23:36.850 | 1:S 18 Jun 2024 15:23:36.850 # Error condition on socket for SYNC: Connection refused |  
  |   | 2024-06-18 18:23:37.852 | 1:S 18 Jun 2024 15:23:37.852 * Connecting to MASTER 172.20.181.236:6379 |  
  |   | 2024-06-18 18:23:37.852 | 1:S 18 Jun 2024 15:23:37.852 * MASTER <-> REPLICA sync started

redis-1 (sentinel tries to restart it):

  |   | 2024-06-18 18:26:55.109 | 1:S 18 Jun 2024 15:26:55.109 * Ready to accept connections tcp |  
  |   | 2024-06-18 18:26:55.109 | 1:S 18 Jun 2024 15:26:55.109 * Connecting to MASTER 172.20.75.109:6379 |  
  |   | 2024-06-18 18:26:55.110 | 1:S 18 Jun 2024 15:26:55.109 * MASTER <-> REPLICA sync started |  
  |   | 2024-06-18 18:26:55.110 | 1:S 18 Jun 2024 15:26:55.110 * Non blocking connect for SYNC fired the event. |  
  |   | 2024-06-18 18:26:55.111 | 1:S 18 Jun 2024 15:26:55.111 * Master replied to PING, replication can continue... |  
  |   | 2024-06-18 18:26:55.113 | 1:S 18 Jun 2024 15:26:55.112 * Trying a partial resynchronization (request 8605e4e1a74e2a74a8ad3742efb5784ad4b0ce41:1). |  
  |   | 2024-06-18 18:26:55.113 | 1:S 18 Jun 2024 15:26:55.113 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master |  
  |   | 2024-06-18 18:26:56.114 | 1:S 18 Jun 2024 15:26:56.113 * Connecting to MASTER 172.20.75.109:6379 |  
  |   | 2024-06-18 18:26:56.114 | 1:S 18 Jun 2024 15:26:56.114 * MASTER <-> REPLICA sync started

sentinel-1 (leader)

  |   | 2024-06-18 18:26:55.883 | 1:X 18 Jun 2024 15:26:55.883 * +reboot master mymaster 172.20.181.236 6379 |  
  |   | 2024-06-18 18:28:09.960 | 1:X 18 Jun 2024 15:28:09.960 # +new-epoch 21 |  
  |   | 2024-06-18 18:28:09.960 | 1:X 18 Jun 2024 15:28:09.960 # +try-failover master mymaster 172.20.181.236 6379 |  
  |   | 2024-06-18 18:28:09.963 | 1:X 18 Jun 2024 15:28:09.963 * Sentinel new configuration saved on disk |  
  |   | 2024-06-18 18:28:09.963 | 1:X 18 Jun 2024 15:28:09.963 # +vote-for-leader aa33680947f52ae19df761ea8f26a4285d4910c1 21 |  
  |   | 2024-06-18 18:28:09.969 | 1:X 18 Jun 2024 15:28:09.969 * d4ca60ac0fa2353d3c6a5684df1401f8faccf6ef voted for aa33680947f52ae19df761ea8f26a4285d4910c1 21 |  
  |   | 2024-06-18 18:28:09.969 | 1:X 18 Jun 2024 15:28:09.969 * d21ee95d5d45a94a9deb59bd2b2797a4bddedf53 voted for aa33680947f52ae19df761ea8f26a4285d4910c1 21 |  
  |   | 2024-06-18 18:28:10.039 | 1:X 18 Jun 2024 15:28:10.039 # +elected-leader master mymaster 172.20.181.236 6379 |  
  |   | 2024-06-18 18:28:10.039 | 1:X 18 Jun 2024 15:28:10.039 # +failover-state-select-slave master mymaster 172.20.181.236 6379 |  
  |   | 2024-06-18 18:28:10.116 | 1:X 18 Jun 2024 15:28:10.115 # -failover-abort-no-good-slave master mymaster 172.20.181.236 6379 |   |   | 2024-06-18 18:28:10.187 | 1:X 18 Jun 2024 15:28:10.187 * Next failover delay: I will not start a failover before Tue Jun 18 15:34:10 2024 |  
  |   | 2024-06-18 18:32:53.938 | 1:X 18 Jun 2024 15:32:53.936 * +reboot master mymaster 172.20.181.236 6379

split-brain-fix-1

  |   | 2024-06-18 18:20:30.025 | Could not connect to Redis at 127.0.0.1:6379: Connection refused |  
  |   | 2024-06-18 18:20:30.025 | Could not connect to Redis at 127.0.0.1:6379: Connection refused |  
  |   | 2024-06-18 18:21:30.027 | Identifying redis master (get-master-addr-by-name).. |  
  |   | 2024-06-18 18:21:30.027 | using sentinel (hewi-redis-ha), sentinel group name (mymaster) |  
  |   | 2024-06-18 18:21:30.043 | Tue Jun 18 15:21:30 UTC 2024 Found redis master (172.20.181.236) |  
  |   | 2024-06-18 18:21:30.046 | Could not connect to Redis at 127.0.0.1:6379: Connection refused |  
  |   | 2024-06-18 18:21:30.049 | Tue Jun 18 15:21:30 UTC 2024 Start... |  
  |   | 2024-06-18 18:21:30.057 | Initializing config.. |  
  |   | 2024-06-18 18:21:30.057 | Copying default redis config.. |  
  |   | 2024-06-18 18:21:30.057 | to '/data/conf/redis.conf' |  
  |   | 2024-06-18 18:21:30.061 | Copying default sentinel config.. |  
  |   | 2024-06-18 18:21:30.061 | to '/data/conf/sentinel.conf' |  
  |   | 2024-06-18 18:21:30.063 | Identifying redis master (get-master-addr-by-name).. |  
  |   | 2024-06-18 18:21:30.063 | using sentinel (hewi-redis-ha), sentinel group name (mymaster) |  
  |   | 2024-06-18 18:21:30.083 | Tue Jun 18 15:21:30 UTC 2024 Found redis master (172.20.181.236) |  
  |   | 2024-06-18 18:21:30.083 | Identify announce ip for this pod.. |  
  |   | 2024-06-18 18:21:30.083 | using (hewi-redis-ha-announce-1) or (hewi-redis-ha-server-1) |  
  |   | 2024-06-18 18:21:30.088 | identified announce (172.20.181.236) |  
  |   | 2024-06-18 18:21:30.088 | Verifying redis master.. |  
  |   | 2024-06-18 18:21:30.088 | ping (172.20.181.236:6379) |  
  |   | 2024-06-18 18:21:30.091 | Could not connect to Redis at 172.20.181.236:6379: Connection refused |  
  |   | 2024-06-18 18:21:34.102 | Could not connect to Redis at 172.20.181.236:6379: Connection refused |  
  |   | 2024-06-18 18:21:39.125 | Could not connect to Redis at 172.20.181.236:6379: Connection refused |  
  |   | 2024-06-18 18:21:45.137 | Tue Jun 18 15:21:45 UTC 2024 Can't ping redis master (172.20.181.236) |  
  |   | 2024-06-18 18:21:45.137 | Attempting to force failover (sentinel failover).. |  
  |   | 2024-06-18 18:21:45.137 | on sentinel (hewi-redis-ha:26379), sentinel grp (mymaster) |  
  |   | 2024-06-18 18:21:45.144 | Tue Jun 18 15:21:45 UTC 2024 Failover returned with 'NOGOODSLAVE' |  
  |   | 2024-06-18 18:21:45.144 | Setting defaults for this pod.. |  
  |   | 2024-06-18 18:21:45.144 | Setting up defaults.. |  
  |   | 2024-06-18 18:21:45.144 | using statefulset index (1) |  
  |   | 2024-06-18 18:21:45.144 | Getting redis master ip.. |  
  |   | 2024-06-18 18:21:45.144 | blindly assuming (hewi-redis-ha-announce-0) or (hewi-redis-ha-server-0) are master |  
  |   | 2024-06-18 18:21:45.161 | identified redis (may be redis master) ip (172.20.75.109) |  
  |   | 2024-06-18 18:21:45.161 | Setting default slave config for redis and sentinel.. |  
  |   | 2024-06-18 18:21:45.161 | using master ip (172.20.75.109) |  
  |   | 2024-06-18 18:21:45.161 | Updating redis config.. |  
  |   | 2024-06-18 18:21:45.162 | we are slave of redis master (172.20.75.109:6379) |  
  |   | 2024-06-18 18:21:45.162 | Updating sentinel config.. |  
  |   | 2024-06-18 18:21:45.162 | evaluating sentinel id (${SENTINEL_ID_1}) |  
  |   | 2024-06-18 18:21:45.162 | sentinel id (aa33680947f52ae19df761ea8f26a4285d4910c1), sentinel grp (mymaster), quorum (2) |  
  |   | 2024-06-18 18:21:45.163 | redis master (172.20.75.109:6379) |  
  |   | 2024-06-18 18:21:45.164 | announce (172.20.181.236:26379) |  
  |   | 2024-06-18 18:21:45.165 | Tue Jun 18 15:21:45 UTC 2024 Ready...

split-brain-fix-0

  |   | 2024-06-18 18:21:56.044 | using sentinel (hewi-redis-ha), sentinel group name (mymaster) |  
  |   | 2024-06-18 18:21:56.052 | Tue Jun 18 15:21:56 UTC 2024 Found redis master (172.20.181.236) |  
  |   | 2024-06-18 18:22:56.056 | Identifying redis master (get-master-addr-by-name).. |  
  |   | 2024-06-18 18:22:56.056 | using sentinel (hewi-redis-ha), sentinel group name (mymaster) |  
  |   | 2024-06-18 18:22:56.063 | Tue Jun 18 15:22:56 UTC 2024 Found redis master (172.20.181.236) |  
  |   | 2024-06-18 18:23:56.067 | Identifying redis master (get-master-addr-by-name)..

To Reproduce I tried node/pod deletion and redis-cli replicaof with no success to reproduce this bug

Expected behavior split-brain-fix container should fix even this rare case

Additional context The scripts logic was broken by inability of sentinel to failover. Maybe script should have additional condition to check the role of potential default master. I will be very apreatiate for any help with this. Please let me know if you need some additional logs/checks

mhkarimi1383 commented 3 months ago

+1

tschirmer commented 1 month ago

I've had this too.

I've found that when we've added a descheduler to the stack (https://github.com/kubernetes-sigs/descheduler) to balance nodes automatically, this kind of issue will disable the redis service frequently.

Can the master allocation be done with kubernetes lease locks? https://kubernetes.io/docs/concepts/architecture/leases/

DandyDeveloper commented 1 month ago

@tschirmer I'm trying to work out why this would happen unless the podManagementPolicy of the STS is set to Parallel?

Is this happening in either of your cases? @tschirmer ??

Because in theory, on first rollout, the first pod should start up and become master, way before -1/-2 start.

mhkarimi1383 commented 1 month ago

@DandyDeveloper Hi I'm having problem when my network becomes a bit unstable (for example pods are not able to each other for a sec.) and my redis pods can't see each other

tschirmer commented 1 month ago

@tschirmer I'm trying to work out why this would happen unless the podManagementPolicy of the STS is set to Parallel?

Is this happening in either of your cases? @tschirmer ??

Because in theory, on first rollout, the first pod should start up and become master, way before -1/-2 start.

Haven't set it to Parallel. I suspect it would be something like, pod when evicted isn't completing the trigger-failover-if-master.sh. We are running it with sentinel, which might add some complexity here. I haven't debugged it yet.

So far we're getting a load of issues with the liveness probe not containing the SENTINELAUTH env from the secret, but it's clearly defined in the spec; and a restart of the pod works. It's happening very frequently though, so I'm wondering if there needs to be a grace period defined on startup and shutdown to prevent it both of these things from happening

mhkarimi1383 commented 4 weeks ago

I think being able to have separated Statefulsets for redises and sentinels will make this chart more stable and manageable, By creating two Statefulsets and giving sentinel monitor config to monitor an external host

tschirmer commented 3 weeks ago

I like the idea of seperate stateful sets, I've been thinking of doing that and making a PR

I suspect this is from preStop hooks not firing and completely successfully. trigger-failover-if-master.sh occasionally doesn't run as expected. When we had the descheduler running it was ~2min between turning on and off each pod, and found that every now and again, that would fail. The rate of failure is low, so it's unlikely occur unless you're hammering it (we haven't had an issue with the ah cluster once we turned off the descheduler.

mhkarimi1383 commented 3 weeks ago

I wanted to make a PR too. But there are a lot of configs that should propagate this change

tschirmer commented 1 week ago

I found that there were a couple things wrong with my setup:

The permissions were the killer, because nothing was failing over on shutdown.

I'm half way through writing a leader elector in golang for this based on k8s leases. Got it claiming the lease already. I'm not sure it's totally necessary after we've solved these other issues though.

tschirmer commented 1 week ago

specifically. In the stateful set the volume definitions here: from:

      volumes:
        - configMap:
            defaultMode: 420  ####THIS ONE  ensured that the preStop Hooks didn't have the permissions to run. Changed it to 430
            name: redis-session-configmap
          name: config
        - hostPath:
            path: /sys
            type: ''
          name: host-sys
        - configMap:
            defaultMode: 493
            name: redis-session-health-configmap
          name: health

to:

      volumes:
        - configMap:
            defaultMode: 430  ####THIS ONE  ensured that the preStop Hooks didn't have the permissions to run. Changed it to 430
            name: redis-session-configmap
          name: config
        - hostPath:
            path: /sys
            type: ''
          name: host-sys
        - configMap:
            defaultMode: 493
            name: redis-session-health-configmap
          name: health
tschirmer commented 1 week ago

Also found that that preStopHook: /readonly-config/..data/trigger-failover-if-master.sh

requires SENTINELAUTH, but it's not defined in the env for the redis container

echo "[K8S PreStop Hook] Start Failover."
get_redis_role() {
  is_master=$(
    redis-cli \
      -a "${AUTH}" --no-auth-warning \
      -h localhost \
      -p 6379 \
      info | grep -c 'role:master' || true
  )
}
get_redis_role

echo "[K8S PreStop Hook] Got redis role."
if [[ "$is_master" -eq 1 ]]; then
  echo "[K8S PreStop Hook] This node is currently master, we trigger a failover."
  response=$(
    redis-cli \
      -a "${SENTINELAUTH}" --no-auth-warning \
      -h 127.0.0.1 \
      -p 26379 \
      SENTINEL failover mymaster
  )
  if [[ "$response" != "OK" ]] ; then
    echo "[K8S PreStop Hook] Failover failed"
    echo "$response"
    exit 1
  fi
  timeout=30
  while [[ "$is_master" -eq 1 && $timeout -gt 0 ]]; do
    sleep 1
    get_redis_role
    timeout=$((timeout - 1))
  done
  echo "[K8S PreStop Hook] Failover successful"
else
  echo "[K8S PreStop Hook] This node is currently replica, no failover needed."
fi
tschirmer commented 1 week ago

^I'd modified the above so I could get some debug data. Along with this in the stateful set:

          lifecycle:
            preStop:
              exec:
                command:
                  - /bin/sh
                  - '-c'
                  - >-
                    echo "running preStop" >> /proc/1/fd/1 &&
                    /readonly-config/trigger-failover-if-master.sh | tee >>
                    /proc/1/fd/1 &&  echo "finished preStop" >> /proc/1/fd/1

the >> /proc/1/fd/1 forces this output in the container log in k8s

tschirmer commented 1 week ago

Found that running preStops would consistently fail.

running preStop
[K8S PreStop Hook] Start Failover.
[K8S PreStop Hook] Got redis role.
[K8S PreStop Hook] This node is currently master, we trigger a failover.
[K8S PreStop Hook] Failover failed

finished preStop

Found that the Sentinel container had shut down before the command could be executed on the localhost., so it kept getting a failover failed. Changed the sentinel preStop to add in a 10 sec delay to keep it alive while this happened and it seems to work every time now.

          lifecycle:
            preStop:
              exec:
                command:
                  - /bin/sh
                  - '-c'
                  - >-
                    sleep 10
cm3brian commented 1 week ago

Found that running preStops would consistently fail.

running preStop
[K8S PreStop Hook] Start Failover.
[K8S PreStop Hook] Got redis role.
[K8S PreStop Hook] This node is currently master, we trigger a failover.
[K8S PreStop Hook] Failover failed

finished preStop

Found that the Sentinel container had shut down before the command could be executed on the localhost., so it kept getting a failover failed. Changed the sentinel preStop to add in a 10 sec delay to keep it alive while this happened and it seems to work every time now.

          lifecycle:
            preStop:
              exec:
                command:
                  - /bin/sh
                  - '-c'
                  - >-
                    sleep 10

While this "might" work, it "may not" be consistent, suggest taking a look at my solution instead here: https://github.com/DandyDeveloper/charts/issues/207#issuecomment-1827134022