aiven / aiven-operator

Provision and manage Aiven Services from your Kubernetes cluster.
https://aiven.github.io/aiven-operator
Apache License 2.0
33 stars 20 forks source link

Why the service user custom resource secret is not update when we update user from aiven console #759

Closed flassagn closed 4 months ago

flassagn commented 5 months ago

Hi,

I would like to understand why the ServiceUser custom resource's secret is not update when we update the user from Aiven console.

First, I deploy the aiven-operator:v0.20.0 chart into our cluster and link with a test Kafka project. Then, I create a test user with the following custom resource

apiVersion: aiven.io/v1alpha1
kind: ServiceUser
metadata:
  name: operator-user
  namespace: default
spec:
  authSecretRef:
    key: token
    name: aiven-token
  connInfoSecretTarget:
    name: operator-user-secret
    prefix: USER_
  project: operator-test-project
  serviceName: operator-test-kafka

The corresponding secret is correctly generated and I saw the right values. If I reset the user's credential, the secret won't be updated even if the custom resource is correctly reconciled without error in operator log

2024-06-18T12:37:54Z    INFO    controllers.ServiceUser    setting up aiven client with instance secret    {"kind": "serviceuser", "name": "default/operator-user", "annotations": {"controllers.aiven.io/generation-was-processed":"1","controllers.aiven.io/instance-is-running":"true"}}
2024-06-18T12:37:54Z    INFO    controllers.ServiceUser    reconciling instance    {"kind": "serviceuser", "name": "default/operator-user", "annotations": {"controllers.aiven.io/generation-was-processed":"1","controllers.aiven.io/instance-is-running":"true"}}
2024-06-18T12:37:54Z    INFO    controllers.ServiceUser    handling service update/creation    {"kind": "serviceuser", "name": "default/operator-user", "annotations": {"controllers.aiven.io/generation-was-processed":"1","controllers.aiven.io/instance-is-running":"true"}}
2024-06-18T12:37:54Z    DEBUG    events    starting reconciliation    {"type": "Normal", "object": {"kind":"ServiceUser","namespace":"default","name":"operator-user","uid":"25c3409d-0fdd-4b07-b394-9d2f1deba7a6","apiVersion":"aiven.io/v1alpha1","resourceVersion":"2873056447"}, "reason": "ReconcilationStarted"}
2024-06-18T12:37:54Z    DEBUG    events    waiting for preconditions of the instance    {"type": "Normal", "object": {"kind":"ServiceUser","namespace":"default","name":"operator-user","uid":"25c3409d-0fdd-4b07-b394-9d2f1deba7a6","apiVersion":"aiven.io/v1alpha1","resourceVersion":"2873056447"}, "reason": "WaitingForPreconditions"}
2024-06-18T12:37:54Z    INFO    controllers.ServiceUser    checking if instance is ready    {"kind": "serviceuser", "name": "default/operator-user", "annotations": {"controllers.aiven.io/generation-was-processed":"1","controllers.aiven.io/instance-is-running":"true"}}
2024-06-18T12:37:54Z    DEBUG    events    preconditions are met, proceeding to create or update    {"type": "Normal", "object": {"kind":"ServiceUser","namespace":"default","name":"operator-user","uid":"25c3409d-0fdd-4b07-b394-9d2f1deba7a6","apiVersion":"aiven.io/v1alpha1","resourceVersion":"2873056447"}, "reason": "PreconditionsAreMet"}
2024-06-18T12:37:54Z    DEBUG    events    waiting for the instance to be running    {"type": "Normal", "object": {"kind":"ServiceUser","namespace":"default","name":"operator-user","uid":"25c3409d-0fdd-4b07-b394-9d2f1deba7a6","apiVersion":"aiven.io/v1alpha1","resourceVersion":"2873056447"}, "reason": "WaitingForInstanceToBeRunning"}
2024-06-18T12:37:55Z    INFO    controllers.ServiceUser    instance was successfully reconciled    {"kind": "serviceuser", "name": "default/operator-user", "annotations": {"controllers.aiven.io/generation-was-processed":"1","controllers.aiven.io/instance-is-running":"true"}}
2024-06-18T12:37:55Z    DEBUG    events    instance is in a RUNNING state 

But If I removed the generated secret, then the operator will create a new one with right values.

I took a look into serviceuser_controller.go#get function

func (h ServiceUserHandler) get(ctx context.Context, avn *aiven.Client, avnGen avngen.Client, obj client.Object) (*corev1.Secret, error) {
    user, err := h.convert(obj)
    if err != nil {
        return nil, err
    }

    u, err := avn.ServiceUsers.Get(ctx, user.Spec.Project, user.Spec.ServiceName, user.Name)
    if err != nil {
        return nil, err
    }

    s, err := avn.Services.Get(ctx, user.Spec.Project, user.Spec.ServiceName)
    if err != nil {
        return nil, err
    }

    var component *aiven.ServiceComponents
    for _, c := range s.Components {
        if c.Component == s.Type {
            component = c
            break
        }
    }

    if component == nil {
        return nil, fmt.Errorf("service component %q not found", s.Type)
    }

    caCert, err := avn.CA.Get(ctx, user.Spec.Project)
    if err != nil {
        return nil, fmt.Errorf("aiven client error %w", err)
    }

    meta.SetStatusCondition(&user.Status.Conditions,
        getRunningCondition(metav1.ConditionTrue, "CheckRunning",
            "Instance is running on Aiven side"))

    metav1.SetMetaDataAnnotation(&user.ObjectMeta, instanceIsRunningAnnotation, "true")

    prefix := getSecretPrefix(user)
    stringData := map[string]string{
        prefix + "HOST":        component.Host,
        prefix + "PORT":        fmt.Sprintf("%d", component.Port),
        prefix + "USERNAME":    u.Username,
        prefix + "PASSWORD":    u.Password,
        prefix + "ACCESS_CERT": u.AccessCert,
        prefix + "ACCESS_KEY":  u.AccessKey,
        prefix + "CA_CERT":     caCert,
        // todo: remove in future releases
        "HOST":        component.Host,
        "PORT":        fmt.Sprintf("%d", component.Port),
        "USERNAME":    u.Username,
        "PASSWORD":    u.Password,
        "ACCESS_CERT": u.AccessCert,
        "ACCESS_KEY":  u.AccessKey,
        "CA_CERT":     caCert,
    }

    return newSecret(user, stringData, false), nil
}

which is call into basic_controller.go#updateInstanceStateAndSecretUntilRunning

func (i *instanceReconcilerHelper) updateInstanceStateAndSecretUntilRunning(ctx context.Context, o v1alpha1.AivenManagedObject) error {
    i.log.Info("checking if instance is ready")

    // Needs to be before o.NoSecret() check because `get` mutates the object's metadata annotations.
    // It set the instanceIsRunningAnnotation annotation when the instance is running on Aiven's side.
    secret, err := i.h.get(ctx, i.avn, i.avnGen, o)
    if secret == nil || err != nil {
        return err
    }

    if o.NoSecret() {
        i.rec.Event(o, corev1.EventTypeNormal, eventConnInfoSecretCreationDisabled, "connInfoSecretTargetDisabled is true, secret will not be created")
        return nil
    }

    _, err = controllerutil.CreateOrUpdate(ctx, i.k8s, secret, func() error {
        return controllerutil.SetControllerReference(o, secret, i.k8s.Scheme())
    })

    return err
}

And I don't understand what it's not the case ? Maybe, we need to update the mutateFN to update the secret stringData also.

The Golang documentation could confirm my assumption https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.18.4/pkg/controller/controllerutil#CreateOrUpdate

CreateOrUpdate creates or updates the given object in the Kubernetes cluster. The object's desired state must be reconciled with the existing state inside the passed in callback MutateFn. The MutateFn is called regardless of creating or updating an object. it returns the executed operation and an error.

byashimov commented 4 months ago

Hey. Unfortunately, that's how the operator is build initially. When it creates/updates an object it marks it as "processed". Then it ignores the object until new changes are made on the kube side. The refactoring the whole operator might be as almost challenging as building a new operator from scratch. As another option we are looking at developing a crossplane provider instead. Which is expected to reconcile objects infinitively.

flassagn commented 4 months ago

Hi @byashimov,

Thank you for your answer, I'm not able to evaluate how the refactoring could huge and challenging, but I trust you.

I'm interesting about the crossplane provider. Do you already start to build something, or it's more in evaluation phase ?

byashimov commented 4 months ago

Thank you for your answer, I'm not able to evaluate how the refactoring could huge and challenging, but I trust you.

The continuous reconciliation means infinite comparison of local and remote states and applying precise changes to the remote state. The current approach is quite straightforward: create and update everything when the user triggers a change. Considering the number of resources we have, this means we should revise the entire codebase of the operator.

Do you already start to build something, or it's more in evaluation phase ?

It is in the "evaluation phase," but we are gathering interest. Please vote here or in this thread. It will help us prioritize.

flassagn commented 4 months ago

Alright, I voted. You could close the current issue