clastix / cluster-api-control-plane-provider-kamaji

The Kamaji Control Plane provider implementation of the Cluster Management API
Apache License 2.0
80 stars 29 forks source link

Updating KCP ready status after creating required resources #66

Closed jds9090 closed 10 months ago

jds9090 commented 10 months ago

CAPI refers control plane's ready status and updates cluster.status.controlPlaneReady.

I want to make sure why KCP(Kamaji Control Plane)'s status changes to the ready state before creating required resources such as workloads' kubeconfig and CA secret. I think the main role of KCP(kamaji Control Plane) is to create TCP(Kamaji Tenant Control Plane) and the required resources. Shouldn't KCP(Kamaji Control Plane) be considered in a ready state after the required resources are created?

For example, CAPI KCP(KubeadmControlPlane)'s status changes to the ready state when there is more than one control plane node which is in ready state and can communicate with CAPI. It makes sense to me because of the main role.

Also, I wonder if there is some logic or a field of CRD to verify whether communication between the workload and mgmt cluster is possible by the kubeconfig or if there is a need to that in terms of TCP(Kamaji Tenant Control Plane) or KCP(Kamaji Control Plane).

For example, KCP(Kamaji Control Plane) or TCP(Tenant Control Plane) may include a field called API SERVER READY which can be checked and updated regularly by the operator.

prometherion commented 10 months ago

I'm a bit confused about this, so I tried it, and this is my resulting output.

$: kubectl get kamajicontrolplanes.controlplane.cluster.x-k8s.io -w
NAME                       VERSION   INITIALIZED   READY   AGE
capi-quickstart-kubevirt   1.23.10                         0s
capi-quickstart-kubevirt   1.23.10   false         false   0s
capi-quickstart-kubevirt   1.23.10   true          false   0s
capi-quickstart-kubevirt   1.23.10   true          false   0s
capi-quickstart-kubevirt   1.23.10   true          false   2s
capi-quickstart-kubevirt   1.23.10   true          false   2s
capi-quickstart-kubevirt   1.23.10   true          true    13s

These are the events we're managing:

I want to make sure why KCP(Kamaji Control Plane)'s status changes to the ready state before creating required resources such as workloads' kubeconfig and CA secret

We could improve this, maybe shifting the Ready check after the retrieval of the CA and the admin kubeconfig, seems legit.

jds9090 commented 10 months ago

Cluster(CAPI CRD) has a field called controlPlaneReady. This field is set by KCP(kamaji control plane)'s Ready state. In terms of Cluster, If KCP(kamaji control Plane) is ready, Control Plane should be ready. But it may be not ready because the fact that KCP(kamaji control plane) is ready doesn't mean control plane is ready.

The fact that KCP(kamaji control plane) is ready means linking CAPI with Tenant Control Plane is ready. Tenant Control Plane could not be ready even when KCP(kamaji control plane) is ready.

So, we can not trust cluster.status.controlPlaneReady. @prometherion How do you think about it?

Additionally, There is a gap between Cluster(CAPI CRD) and KCP(Kamaji Control Plane). Cluster(CAPI CRD) can't catch changes about KCP(Kamaji Control Plane). It assumes control plane is based on Machine(CAPI CRD).

func (r *Reconciler) SetupWithManager(ctx context.Context, mgr ctrl.Manager, options controller.Options) error {
    c, err := ctrl.NewControllerManagedBy(mgr).
        For(&clusterv1.Cluster{}).
        Watches(
            &clusterv1.Machine{},
            handler.EnqueueRequestsFromMapFunc(r.controlPlaneMachineToCluster),
        ).
        WithOptions(options).
        WithEventFilter(predicates.ResourceNotPausedAndHasFilterLabel(ctrl.LoggerFrom(ctx), r.WatchFilterValue)).
        Build(r)

    if err != nil {
        return errors.Wrap(err, "failed setting up with a controller manager")
    }

    r.recorder = mgr.GetEventRecorderFor("cluster-controller")
    r.externalTracker = external.ObjectTracker{
        Controller: c,
        Cache:      mgr.GetCache(),
    }
    return nil
}
prometherion commented 10 months ago

But it may be not ready because the fact that KCP(kamaji control plane) is ready doesn't mean control plane is ready

That's wrong, we're inheriting that readiness from Kamaji itself, and reporting it back.

Additionally, There is a gap between Cluster(CAPI CRD) and KCP(Kamaji Control Plane). Cluster(CAPI CRD) can't catch changes about KCP(Kamaji Control Plane). It assumes control plane is based on Machine(CAPI CRD).

I would like to discuss this in another issue, rather, do you mind opening it, along with some examples, and the desired outcome?