Open sarahhenkens opened 3 years ago
Hmm I think this is the same issue as https://github.com/argoproj/argo-cd/issues/7182. This operator sets the restart policy to "never" and argocd keeps those into a progressing state:
inside: getCorev1PodHealth
:
case corev1.PodRunning:
switch pod.Spec.RestartPolicy {
case corev1.RestartPolicyAlways:
// if pod is ready, it is automatically healthy
if podutils.IsPodReady(pod) {
return &HealthStatus{
Status: HealthStatusHealthy,
Message: pod.Status.Message,
}, nil
}
// if it's not ready, check to see if any container terminated, if so, it's degraded
for _, ctrStatus := range pod.Status.ContainerStatuses {
if ctrStatus.LastTerminationState.Terminated != nil {
return &HealthStatus{
Status: HealthStatusDegraded,
Message: pod.Status.Message,
}, nil
}
}
// otherwise we are progressing towards a ready state
return &HealthStatus{
Status: HealthStatusProgressing,
Message: pod.Status.Message,
}, nil
case corev1.RestartPolicyOnFailure, corev1.RestartPolicyNever:
// pods set with a restart policy of OnFailure or Never, have a finite life.
// These pods are typically resource hooks. Thus, we consider these as Progressing
// instead of healthy.
return &HealthStatus{
Status: HealthStatusProgressing,
Message: pod.Status.Message,
}, nil
}
}
Root cause inside the ArangoDB operator: https://github.com/arangodb/kube-arangodb/blob/13f3e2a09b4c6c08f050efffc364d498b1293dcf/pkg/util/k8sutil/pods.go#L433
Is there a better way to let ArgoCD still let pods be considered healthy with a custom setting?
What is the rational reason to do Never from ArangoDB? I believe that question has to be explored.
From the linked ticket:
We do not want to allow Pod restarts, full lifecycle is managed by Operator (Operator recreate pod, takes care about shards).
We need to have a quick discussion about this. I will bring it up in tomorrow's maintainer meeting.
The same issue: https://github.com/argoproj/argo-cd/issues/7182
@wanghong230, Any updates from the maintainer meeting?
I have this same issue when using the Spark Operator. The driver and executors have a restart policy of never and continue to show progressing when the pod state is running.
There are many operators that behave this way including Koperator
and NiFiKop
. This behavior should at least be configurable through an Application
/ApplicationSet
.
I have same issue with spark operator, too. Many operators make pods' restart policy to Never.
... and also with the Task Manager container, that is managed by the Flink Operator.
Related/duplicate issue: https://github.com/argoproj/argo-cd/issues/7182.
Checklist:
argocd version
.Describe the bug
When using the ArangoDB operator with ArgoCD. Any pod created (and attached to the custom resource) gets stuck in a forever "Progressing" state. While a
kube describe pod <pod-id>
is showing a ready state:Discussion in Slack: https://cloud-native.slack.com/archives/C01TSERG0KZ/p1631432657161800
To Reproduce
It will load all the examples from https://github.com/arangodb/kube-arangodb into the default namespace.
Expected behavior
The pod is expected to show as
Healthy
in the ArgoCD UI and reports once running with theReady
state set totrue
.Version