Open Barteus opened 1 year ago
@Barteus you're talking about eviction of the charm operator pods, right? Not the underlying kubeflow workload pods (like the actual kfp-api workload, etc)?
I wonder whether we're missing some logic in our charms, or if Juju is mishandling something
Needs investigration.
This issue requires us to go through the charms and see which ones are affected.
When you leave Charmed Kubeflow (CKF) running for some time - Pods eviction happens. The evicted Pods stay in the system until the GC of evicted Pods is invoked. Based on the default value of vanilla Kubernetes it happens when 12.500 Pods Evicted Pods are in the system.
On the running CKF when Eviction happens Pods are left in juju which means that the leadership is not transferred from Evicted Pod to the newly created one.
Reproduce:
Workaround: Manually remove all Evicted Pods