canonical / kfp-operators

Kubeflow Pipelines Operators
Apache License 2.0
2 stars 12 forks source link

kfp-profile-controller silently fails if PodDefault does not exist #171

Open ca-scribner opened 1 year ago

ca-scribner commented 1 year ago

The kfp-profile-controller charm will go to Active state regardless of whether the PodDefault CRD exists, but the CompositeController deployed in sync.py will fail to apply anything if the PodDefault is missing. This can be seen in the logs for metacontroller-operator-charm-0:

kubectl logs metacontroller-operator-charm-0 -f 

{"level":"info","ts":1678994413.3888426,"logger":"composite","msg":"Sync CompositeController","name":"kubeflow-pipelines-profile-controller"}
{"level":"error","ts":1678994413.389125,"logger":"controller-runtime.manager.controller.composite-metacontroller","msg":"Reconciler error","name":"kubeflow-pipelines-profile-controller","namespace":"","error":"can't find child resource \"poddefaults\" in kubeflow.org/v1alpha1","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.5/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.5/pkg/internal/controller/controller.go:214"}

Is there a way we can notice this and surface it to the charm? Without an error, the just doesn't have some of the resources required to work for pipelines, notebooks, etc.

ca-scribner commented 1 year ago

The CompositeController does see the warning:

kubectl describe compositecontrollers.metacontroller.k8s.io
...

  Warning  CreateError  5m1s (x9 over 43m)  metacontroller  Cannot create new controller: can't find child resource "poddefaults" in kubeflow.org/v1alpha1

so that might be detectable at the charm level?

ca-scribner commented 1 year ago

This might be more complicated... In trying to resolve above manually, I found even after I deployed the PodDefault the CompositeControllers were still failing. Does metacontroller only support CRDs that were already available at metacontroller deploy time? If yes, that means we need to ensure metacontroller precedes kfp-profile-controller's deployment