Closed jsalatiel closed 3 years ago
Hi @jsalatiel , Is it possible that you are disconnecting all the nodes from each other? In that case, if the three of the nodes are separated probably the cluster is unable to recover since there will be no quorum.
Just found out that it only happens to one of my clusters. So apparently something else is the culprit. Still debugging....
Hi @jsalatiel , Let us know what you find, thank you very much
I noticed the same issue. When master pod is killed (and I mean killed, not gracefully deleted), then the prestop scripts do not run and sentinel leader election repeatedly fails, electing the killed pod's IP over and over again.
+new-epoch 5
+try-failover master mymaster 10.42.2.105 6379
+vote-for-leader 9e94388e0e7ed173bbb6ae0abc62f82f7234d8bd 5
ab5263c517368f093160a2d138ad0fc18d8bb76b voted for 9e94388e0e7ed173bbb6ae0abc62f82f7234d8bd 5
+elected-leader master mymaster 10.42.2.105 6379
+failover-state-select-slave master mymaster 10.42.2.105 6379
-failover-abort-no-good-slave master mymaster 10.42.2.105 6379
(Note that 10.42.2.105 is the address of the killed pod, it doesn't exist).
The new incarnation of the killed pod has a different IP address and its sentinel logs do not shed much light:
23:07:33.09 DEBUG ==> redis-headless.redis.svc.cluster.local has my IP: 10.42.2.106
23:07:33.10 INFO ==> Cleaning sentinels in sentinel node: 10.42.0.76
1
23:07:38.11 INFO ==> Cleaning sentinels in sentinel node: 10.42.1.172
1
23:07:43.14 INFO ==> Sentinels clean up done
Could not connect to Redis at 10.42.2.105:26379: No route to host
Hi @mouchar , Thank you for your investigation. I think what you described is the basic functionality of the recovery and it should be working by default. Could you tell us about the environment you are using?
My environment:
auth:
enabled: false
sentinel: false
sentinel:
enabled: true
replica:
replicaCount: 3
metrics:
enabled: true
Steps to reproduce:
helm -n redis upgrade --install --wait --create-namespace redis bitnami/redis -f /tmp/bitnami-redis.yaml --set sentinel.image.debug=true
kubectl -n redis get pod -o wide
NAME READY STATUS RESTARTS AGE IP ...
redis-node-0 3/3 Running 0 17m 192.168.103.56 ...
redis-node-1 3/3 Running 0 17m 192.168.165.186 ...
redis-node-2 3/3 Running 0 16m 192.168.143.71 ...
kubectl -n redis delete pod redis-node-0 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "redis-node-0" force deleted
kubectl -n redis get pod -o wide
NAME READY STATUS RESTARTS AGE IP
redis-node-0 1/3 CrashLoopBackOff 18 24m 192.168.125.123
redis-node-1 3/3 Running 0 44m 192.168.165.186
redis-node-2 3/3 Running 0 43m 192.168.143.71
1:X 14 May 2021 10:13:04.534 # +try-failover master mymaster 192.168.103.56 6379
1:X 14 May 2021 10:13:04.537 # +vote-for-leader 1ab5810d71792954bafa0a3bb084a689c991017e 32
1:X 14 May 2021 10:13:04.543 # 0b81bb3f406fbbaf221bf2b85c183121900f1d83 voted for 1ab5810d71792954bafa0a3bb084a689c991017e 32
1:X 14 May 2021 10:13:04.638 # +elected-leader master mymaster 192.168.103.56 6379
1:X 14 May 2021 10:13:04.638 # +failover-state-select-slave master mymaster 192.168.103.56 6379
1:X 14 May 2021 10:13:04.739 # -failover-abort-no-good-slave master mymaster 192.168.103.56 6379
1:X 14 May 2021 10:13:04.805 # Next failover delay: I will not start a failover before Fri May 14 10:13:40 2021
10:10:09.70 DEBUG ==> redis-headless.redis.svc.cluster.local has my IP: 192.168.125.123
10:10:09.71 INFO ==> Cleaning sentinels in sentinel node: 192.168.165.186
1
10:10:14.71 INFO ==> Cleaning sentinels in sentinel node: 192.168.143.71
1
10:10:19.72 INFO ==> Sentinels clean up done
Be careful that --force can make the old and new pod run simultaneous and can lead to data corruption. You should only force of you are sure the old pod is really dead. ( Or its node is dead )
https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/
"Force deletions do not wait for confirmation from the kubelet that the Pod has been terminated. Irrespective of whether a force deletion is successful in killing a Pod, it will immediately free up the name from the apiserver. This would let the StatefulSet controller create a replacement Pod with that same identity; this can lead to the duplication of a still-running Pod, and if said Pod can still communicate with the other members of the StatefulSet, will violate the at most one semantics that StatefulSet is designed to guarantee.
When you force delete a StatefulSet pod, you are asserting that the Pod in question will never again make contact with other Pods in the StatefulSet and its name can be safely freed up for a replacement to be created."
Hi @mouchar , As @jsalatiel said, it could be possible that the rest of the pods still think that pod exists due to the force deletion. To recover from that state you could try to manually execute a failover like:
SENTINEL failover <master name>
Hi @miguelaeh After killing master node forcefully, I tried running this command from 2 other pods but got below error:
10.33.144.235:26379> SENTINEL FAILOVER mymaster
(error) NOGOODSLAVE No suitable replica to promote
Hi guys, It seems to be related to https://github.com/bitnami/charts/issues/6165. We plan to work on it during the next weeks, so hopefully, we will have a solution soon. Sorry for the inconvenience.
Bump. I am seeing the same issue in my cluster. It's made worse by the fact that I am attempting to run the statefulset on spot instances (trying to get node draining and failover to work quick enough to not cause a major service disruption).
A colleague is already working on it and he will update this thread once it is solved
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.
Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.
Which chart: bitnami/redis
Describe the bug I have a redis cluster ( sentinel ) with 3 replicas. They are spread in 3 different nodes. If I kill the container with the role:master by running kubectl delete current-master other node will be promoted to master as expected. Although sometimes it can take almost 1 minute while others take just a few seconds. ( related to the leader lease? ) The problem is when one of the worker nodes where the master is running dies ( poweroff the VM for example ). One new master will never be elected. There is absolutely nothing on the logs for the remaining sentinels.
To Reproduce You can easily reproduce also on a single node by simply creating a netpolicy that blocks all traffic to/from the current master. This is my values.yaml
This is the netpolicy you can use, just change the label selector for the current master.
Expected behavior Sentinel should detect the master is down and promote a new one
Version of Helm and Kubernetes: helm 3.3.4 k8s 1.19.9