Closed andreykont closed 6 months ago
I'm a bit confused @andreykont, isn't this a problem in Kamaji upstream rather than on the CAPI provider?
I'm a bit confused @andreykont, isn't this a problem in Kamaji upstream rather than on the CAPI provider?
Good day, @prometherion . Yes. I completely agree. Kamaji doesn't restore owner reference of it's CA secret. It's a mistake. But anyway it might take some period of time. I think capi provider has to control such situation (kamaji and capiCA secret names are equal) to avoid racing. In addition, I think Kamaji project knows nothing about CAPI project and don't have to control it's CA secret name (to be not equals with CAPI project secret name).
Kamaji doesn't restore owner reference of it's CA secret
That's correct: https://github.com/clastix/kamaji/blob/eff68db336f3c161447e08a6733344260c10e12a/internal/resources/ca_certificate.go#L92-L110
This snippet shows we're just avoiding rewriting the secret content if the certificates are matching since Velero is not able to restore the Owner Reference.
But anyway it might take some period of time
It's Kubernetes, eventually consistent :) Jokes apart, we can wait for a few iterations before having everything reconciled properly, and I'd say Kamaji resources are our source of truth since the CAPI CP provider is almost dummy.
I think capi provider has to control such situation (kamaji and capiCA secret names are equal) to avoid racing. In addition, I think Kamaji project knows nothing about CAPI project and don't have to control it's CA secret name (to be not equals with CAPI project secret name)
This is a bit problematic since we started Kamaji completely decoupled from Cluster API, and the CP provider is the one responsible for this, however, we stumbled upon a naming collision afterwards and had to patch in this way.
tl;dr; WDYT in fixing this in Kamaji, rather than in CAPI?
Hello, @prometherion.
Let me clarify the problem once more. After restoring kamaji tenant control plane’s secret with cert and key is restored, KamajiControlPlane Provider start reconcilation and adopts the secret changing keys in its map: https://github.com/clastix/cluster-api-control-plane-provider-kamaji/blob/346f3ca1de43661a2b44bee84c66f60f1d29e24c/controllers/kamajicontrolplane_controller_resources.go#L102-L105
Then it updates secret and returns to reconcile process and comes back to check secret once more. As ownerReference is set by KamajiControlPlaneProvider line https://github.com/clastix/cluster-api-control-plane-provider-kamaji/blob/346f3ca1de43661a2b44bee84c66f60f1d29e24c/controllers/kamajicontrolplane_controller_resources.go#L108
the secret is not valid for that check: https://github.com/clastix/cluster-api-control-plane-provider-kamaji/blob/346f3ca1de43661a2b44bee84c66f60f1d29e24c/controllers/kamajicontrolplane_controller_resources.go#L77
and KamajiControlPlaneProvider tries to find ca.crt and ca.key in secret, which it patched during the previous reconcilation loop. As the result it logs errors as it is not able to find in TenantControlPlane secret ca.crt,
another solution is to add more checks there https://github.com/clastix/cluster-api-control-plane-provider-kamaji/blob/346f3ca1de43661a2b44bee84c66f60f1d29e24c/controllers/kamajicontrolplane_controller_resources.go#L77
either by name equals or that secret is already patched for CAPI.
I think it should not be patched from the kamaji upstream as it is not aware of kamajiControlPlane provider. Moreover, i think it can be fixed in kamajiControlPlane provider.
If we come to CAPI Project to fix that it is going to break their agreement on how to prepare secrets i think. https://cluster-api.sigs.k8s.io/developer/architecture/controllers/cluster.html#secrets
I think it should not be patched from the kamaji upstream as it is not aware of kamajiControlPlane provider
True the fact Kamaji is not aware of the CAPI CP provider, but Owner/Controller References should be fixed upon first reconciliation, that's the reason why I'm suggesting fixing in Kamaji.
Once we have the OwnerReference in place, there's not bug in in the CP provider, if I understand correctly.
Hello, @prometherion. Ok, I will open PR in Kamaji project to restore owner reference during reconciliation process.
My goal is restore managed cluster from backup. I use Velero for restore. First of all I recover etcd database, TenantControlPlane and other generic capi resources. On last stage I have to recover KamajiControlPlane resource.
KamajiControlPlane gets "Ready" status, but I see error in kamaji control plane controller's log:
2024-02-21T13:54:48Z ERROR Reconciler error {"controller": "kamajicontrolplane", "controllerGroup": "controlplane.cluster.x-k8s.io", "controllerKind": "KamajiControlPlane", "KamajiControlPlane": {"name":"ocean-ocean-avkontya3-jvpinos","namespace":"ocean-ocean-avkontya3-jvpinos"}, "namespace": "ocean-ocean-avkontya3-jvpinos", "name": "ocean-ocean-avkontya3-jvpinos", "reconcileID": "2cece214-b52e-41fc-884a-cac3ebd28bd6", "error": "cannot create or update CA secret: missing Certificate value from *kamajiv1alpha1.TenantControlPlane CA", "errorVerbose": "missing Certificate value from *kamajiv1alpha1.TenantControlPlane CA\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.(*KamajiControlPlaneReconciler).createOrUpdateCertificateAuthor
Cause of problem: In my case kamajiCA and capiCA secret names are equals. Also when I restore kcp with Velero, it doesn't restore secret OwnerReferences. Kamaji control plane detects it like it's own object and delete "ca.crt" and "ca.key" data in kamajiCA secret. On the second iteration it makes error, because ca.crt deleted.
Wanted behavior: KamajiControlPlane controller doesn't control kamajiCA secrets.
How I solved the error: Our provider doesn't have to control KamajiCA secret. I replaced condition that check who is owner of CA secret. Now we control it by secret names.