clastix / cluster-api-control-plane-provider-kamaji

The Kamaji Control Plane provider implementation of the Cluster Management API
Apache License 2.0
83 stars 30 forks source link

Openstack patch cluster error #151

Open nilpntr opened 21 hours ago

nilpntr commented 21 hours ago

When deploying a kamaji cluster on Openstack I would get this error(in the capi-kamaji-controller-manager) after the TenantControlPlane had been created, initialised and being ready:

2024-11-13T10:34:25Z    ERROR   Reconciler error    {"controller": "kamajicontrolplane", "controllerGroup": "controlplane.cluster.x-k8s.io", "controllerKind": "KamajiControlPlane", "KamajiControlPlane": {"name":"test-kamaji-cluster-control-plane","namespace":"test-kamaji-cluster"}, "namespace": "test-kamaji-cluster", "name": "test-kamaji-cluster-control-plane", "reconcileID": "42e03cd9-c0b9-445e-ac8f-0d451ddf8e63", "error": "cannot perform PATCH update for the OpenStackCluster resource: failed to patch OpenStackCluster test-kamaji-cluster/test-kamaji-cluster: OpenStackCluster.infrastructure.cluster.x-k8s.io \"test-kamaji-cluster\" is invalid: [spec.identityRef: Required value, <nil>: Invalid value: \"null\": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]", "errorVerbose": "OpenStackCluster.infrastructure.cluster.x-k8s.io \"test-kamaji-cluster\" is invalid: [spec.identityRef: Required value, <nil>: Invalid value: \"null\": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]\nfailed to patch OpenStackCluster test-kamaji-cluster/test-kamaji-cluster\nsigs.k8s.io/cluster-api/util/patch.(*Helper).Patch\n\t/go/pkg/mod/sigs.k8s.io/cluster-api@v1.8.4/util/patch/patch.go:157\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.(*KamajiControlPlaneReconciler).patchOpenStackCluster\n\t/workspace/controllers/kamajicontrolplane_controller_cluster_patch.go:234\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.(*KamajiControlPlaneReconciler).patchCluster\n\t/workspace/controllers/kamajicontrolplane_controller_cluster_patch.go:101\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.(*KamajiControlPlaneReconciler).Reconcile.func6\n\t/workspace/controllers/kamajicontrolplane_controller.go:185\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.TrackConditionType\n\t/workspace/controllers/conditions.go:26\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.(*KamajiControlPlaneReconciler).Reconcile\n\t/workspace/controllers/kamajicontrolplane_controller.go:184\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:224\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695\ncannot perform PATCH update for the OpenStackCluster resource\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.(*KamajiControlPlaneReconciler).patchOpenStackCluster\n\t/workspace/controllers/kamajicontrolplane_controller_cluster_patch.go:235\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.(*KamajiControlPlaneReconciler).patchCluster\n\t/workspace/controllers/kamajicontrolplane_controller_cluster_patch.go:101\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.(*KamajiControlPlaneReconciler).Reconcile.func6\n\t/workspace/controllers/kamajicontrolplane_controller.go:185\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.TrackConditionType\n\t/workspace/controllers/conditions.go:26\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.(*KamajiControlPlaneReconciler).Reconcile\n\t/workspace/controllers/kamajicontrolplane_controller.go:184\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:224\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695"}

After digging in the code I discovered that the patchOpenStackCluster function would try to overwrite the spec of an OpenstackCluster. To test my solution I've created a little test script which loads a json kind: Cluster and tries the old implementation and the fix:

package main

import (
    "context"
    "encoding/json"
    "fmt"
    "github.com/pkg/errors"
    "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
    "k8s.io/apimachinery/pkg/types"
    "k8s.io/client-go/tools/clientcmd"
    "k8s.io/client-go/util/homedir"
    "log"
    "os"
    "path/filepath"
    capiv1beta1 "sigs.k8s.io/cluster-api/api/v1beta1"
    "sigs.k8s.io/controller-runtime/pkg/client"
)

func main() {
    kubeConfig := filepath.Join(homedir.HomeDir(), ".kube", "config")
    cfg, err := clientcmd.BuildConfigFromFlags("", kubeConfig)
    if err != nil {
        fmt.Printf("Error loading kubeconfig: %v\n", err)
        return
    }

    k8sClient, err := client.New(cfg, client.Options{})
    if err != nil {
        fmt.Printf("Error creating client: %v\n", err)
        return
    }

    raw, err := os.ReadFile("cluster.json")
    if err != nil {
        log.Fatal(err)
    }

    var cluster capiv1beta1.Cluster
    if err = json.Unmarshal(raw, &cluster); err != nil {
        log.Fatal(err)
    }

    ctx := context.Background()

    if err = patchOpenStackCluster(ctx, k8sClient, cluster, "123.456.78.91", 6443); err != nil {
        log.Fatal(err)
    }

    if err = patchOpenStackClusterFix(ctx, k8sClient, cluster, "123.456.78.91", 6443); err != nil {
        log.Fatal(err)
    }
}

func patchOpenStackCluster(ctx context.Context, k8sClient client.Client, cluster capiv1beta1.Cluster, endpoint string, port int64) error {
    osc := unstructured.Unstructured{}

    osc.SetGroupVersionKind(cluster.Spec.InfrastructureRef.GroupVersionKind())
    osc.SetName(cluster.Spec.InfrastructureRef.Name)
    osc.SetNamespace(cluster.Spec.InfrastructureRef.Namespace)

    if err := k8sClient.Get(ctx, types.NamespacedName{Name: osc.GetName(), Namespace: osc.GetNamespace()}, &osc); err != nil {
        return errors.Wrap(err, fmt.Sprintf("cannot retrieve the %s resource", osc.GetKind()))
    }

    if err := unstructured.SetNestedMap(osc.Object, map[string]interface{}{
        "apiServerFixedIP": endpoint,
        "apiServerPort":    port,
    }, "spec"); err != nil {
        return errors.Wrap(err, fmt.Sprintf("unable to set unstructured %s spec patch", osc.GetKind()))
    }

    out, err := osc.MarshalJSON()
    if err != nil {
        return errors.Wrap(err, fmt.Sprintf("unable to marshal unstructured %s spec patch", osc.GetKind()))
    }

    if err = os.WriteFile("cluster-patch.json", out, 0600); err != nil {
        return errors.Wrap(err, fmt.Sprintf("unable to write cluster-patch.json"))
    }

    return nil
}

func patchOpenStackClusterFix(ctx context.Context, k8sClient client.Client, cluster capiv1beta1.Cluster, endpoint string, port int64) error {
    osc := unstructured.Unstructured{}

    osc.SetGroupVersionKind(cluster.Spec.InfrastructureRef.GroupVersionKind())
    osc.SetName(cluster.Spec.InfrastructureRef.Name)
    osc.SetNamespace(cluster.Spec.InfrastructureRef.Namespace)

    if err := k8sClient.Get(ctx, types.NamespacedName{Name: osc.GetName(), Namespace: osc.GetNamespace()}, &osc); err != nil {
        return errors.Wrap(err, fmt.Sprintf("cannot retrieve the %s resource", osc.GetKind()))
    }

    if err := unstructured.SetNestedField(osc.Object, endpoint, "spec", "apiServerFixedIP"); err != nil {
        return errors.Wrap(err, fmt.Sprintf("unable to set unstructured %s spec apiServerFixedIP", osc.GetKind()))
    }

    if err := unstructured.SetNestedField(osc.Object, port, "spec", "apiServerPort"); err != nil {
        return errors.Wrap(err, fmt.Sprintf("unable to set unstructured %s spec apiServerPort", osc.GetKind()))
    }

    out, err := osc.MarshalJSON()
    if err != nil {
        return errors.Wrap(err, fmt.Sprintf("unable to marshal unstructured %s spec patch", osc.GetKind()))
    }

    if err = os.WriteFile("cluster-patch-fix.json", out, 0600); err != nil {
        return errors.Wrap(err, fmt.Sprintf("unable to write cluster-patch.json"))
    }

    return nil
}

Comparing the 2 outputs I saw that the current implementation overwrites the spec with an object containing only the 2 fields. When looking at the fix I saw that it just modifies the fields as we would expect.

nilpntr commented 21 hours ago

Created a fix here: https://github.com/clastix/cluster-api-control-plane-provider-kamaji/pull/150

prometherion commented 21 hours ago

Feeling sorry for this, the bug has been introduced with #48.