Open Leo791 opened 1 month ago
@Leo791, as the code shows:
// updateStatefulSet is a method to update statefulset in Kubernetes
func updateStatefulSet(cl kubernetes.Interface, logger logr.Logger, namespace string, stateful *appsv1.StatefulSet, recreateStateFulSet bool) error {
_, err := cl.AppsV1().StatefulSets(namespace).Update(context.TODO(), stateful, metav1.UpdateOptions{})
if recreateStateFulSet {
sErr, ok := err.(*apierrors.StatusError)
if ok && sErr.ErrStatus.Code == 422 && sErr.ErrStatus.Reason == metav1.StatusReasonInvalid {
failMsg := make([]string, len(sErr.ErrStatus.Details.Causes))
for messageCount, cause := range sErr.ErrStatus.Details.Causes {
failMsg[messageCount] = cause.Message
}
logger.V(1).Info("recreating StatefulSet because the update operation wasn't possible", "reason", strings.Join(failMsg, ", "))
propagationPolicy := metav1.DeletePropagationForeground
if err := cl.AppsV1().StatefulSets(namespace).Delete(context.TODO(), stateful.GetName(), metav1.DeleteOptions{PropagationPolicy: &propagationPolicy}); err != nil { //nolint
return errors.Wrap(err, "failed to delete StatefulSet to avoid forbidden action")
}
}
}
if err != nil {
logger.Error(err, "Redis statefulset update failed")
return err
}
logger.V(1).Info("Redis statefulset successfully updated ")
return nil
}
StatefulSets are only deleted when you attempt to update forbidden fields, such as the persistentVolumeClaimTemplate field. Therefore, in my opinion, when a StatefulSet gets stuck in a pending state due to insufficient resources, we need to manually delete the StatefulSet (and its pods) under the current code design.
But isn't that the purpose of the annotation: redis.opstreelabs.in/recreate-statefulset: "true"?
But isn't that the purpose of the annotation: redis.opstreelabs.in/recreate-statefulset: "true"?
No, we only recreate the StatefulSet when there is an update to forbidden fields. We cannot recreate the StatefulSet when a pod is pending because we cannot determine whether the pending state is temporary or permanent.
Got it thank you!
And regarding the failover issue? Where the operator is looking for a pod that doesn't exist. Are you aware of this issue?
We believe the operator is promoting the follower to leader, but expects it to be named leader-3 instead of follower-x.
Actually, the role string in the pod name does not represent the actual role of the Redis node. We should not rely on the pod name to identify its role.
Is there a way to prevent failover from occurring and the promotion to occur?
Failover is handled by the Redis cluster itself, not by the operator. The operator simply creates resources and integrates them into a Redis cluster. Failover is automatically managed by the cluster.
Actually, the role string in the pod name does not represent the actual role of the Redis node. We should not rely on the pod name to identify its role.
But right now we are indeed using the pod name to identify the role no? And that's what's causing the problem we think.
How is the operator getting the pod roles and number of masters or slaves? Is it through cluster nodes? If so we believe that the fact that cluster nodes remains with a deleted master in master, fail
state, is confusing the operator into expecting n+1 masters. And this is causing it to stay in an error state.
What version of redis operator are you using?
redis-operator version: v0.18.0
Does this issue reproduce with the latest release? Yes
What operating system and processor architecture are you using (
kubectl version
)?Client Version: v1.30.2 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.30.0
What did you do?
redis.opstreelabs.in/recreate-statefulset: "true"
to the crdWhat did you expect to see?
What did you see instead?
The operator throws the following errors:
And if we exec into a cluster pod and ask for the cluster node info we get:
We believe the operator is promoting the follower to leader, but expects it to be named leader-3 instead of follower-x. This causes the updated of the stateful set to be blocked and we cannot rollback the cluster to a healthy state.
Is there a way to prevent failover from occurring and the promotion to occur?