Closed Sponge-Bas closed 11 months ago
I am also confused. Why argo-controller/0 went into error is one question (don't see anything in the debug-logs that tell me), but that argo-controller/1 was prevented from taking over sounds even more problematic. I wonder if that's because argo-controller/0 was stuck in error state?
We'll need to dig in further
Also seen on this run: https://solutions.qa.canonical.com/v2/testruns/b53a3ef6-fe06-434b-8a50-d2acab67eb48/
However, this time with the kfp-viz
units. Here is the Kubernetes crashdump and here is the Kubeflow crashdump
Seen again on this run: https://solutions.qa.canonical.com/v2/testruns/dfd21ddf-545e-4427-9e44-0a224a91a36a/
This time around with kfp-profile-controller
units. Here's the Kubernetes crashdump and here is the Kubeflow crashdump.
ty for these logs! We will check these out
this should be resolved by sidecar rewrite. Not taking action until rewrite
We think these should be resolved in the most recent charms. Please reopen or post again if this persists.
In testrun https://solutions.qa.canonical.com/testruns/testRun/4766dc32-4035-4840-8d5d-1bdc13853881, the argo-controller unit is in error state:
The kubeflow crashdump shows that the pod is indeed in error state, but the kubernetes crashdump shows no logs for this container under
9/baremetal/pod-logs/
.I'm not too sure what happened here, but maybe one of you can make it out from the available logs.