crossplane-contrib / provider-ansible

Crossplane provider to execute Ansible contents remotely inside a Kubernetes cluster.
Apache License 2.0
61 stars 24 forks source link

provider does not reflect deletion to resources deleted in the control plane #231

Open TCMPK opened 1 year ago

TCMPK commented 1 year ago

The provider-ansible provider does not invoke a finalizer or similar to cleanup created resources when a delete operation is triggered in the control plane (e.g. kubectl delete ansiblerun <run>) This behavior is not reflected in the documentation.

What problem are you facing?

When deleting a resource - here AnsibleRun the provider does not invoke an ansible-run with the „state=absent“ variable. The documentation and examples indicate that there should be an ansible-run to reflect the changes to the provisioned resources. (article and referenecd module) Because of this behavior only creation and update tasks are performed. This may be related to the following code fragment which Orphans (Link to source) the resource.

How could Crossplane help solve your problem?

Implement the functionality or adjust the documentation that this is a planned or wont-fix behavior of the provider.

janorga commented 9 months ago

This bug is still present, I've checked it with release v0.5.0 and with current commit 5e0d99269289a6c47b232d829eff424cfa374087.

Running deletion tasks on Delete() seems to be a key functionality according to the documentation. So either documentation or functionality needs to be fixed.

janorga commented 9 months ago

I have limited experience with Kubernetes controllers and Crossplane framework, please check if this makes sense.

This works on my dev setup. Delete() is marking "Deletion" Reason, and we still haven't got an evaluation about if the playbook returns failed tasks or not, so we just rely on the one-shot Run() execution. So we can take meta.WasDeleted() and Deleting Reason as marks (inside Observe()) to know that the controller has done once the Delete() run and then we mark a non-existent resource.

Another option is to move the mark "Deleting" inside Delete() after the Run() function and the corresponding error evaluation to get stuck to Delete() until c.runner.Run() has been done without errors.

Please, check the proposal:

func (c *external) Observe(ctx context.Context, mg resource.Managed) (managed.ExternalObservation, error) {
    cr, ok := mg.(*v1alpha1.AnsibleRun)
    if !ok {
        return managed.ExternalObservation{}, errors.New(errNotAnsibleRun)
    }
    /* set Deletion Policy to Orphan as we cannot observe the external resource.
       So we won't wait for external resource deletion before attempting
       to delete the managed resource */

        /********** Don't go here in DeletionOrphan Policy **********/
    // cr.SetDeletionPolicy(xpv1.DeletionOrphan)
        /***************************************************************/

    switch c.runner.GetAnsibleRunPolicy().Name {
    case "ObserveAndDelete", "":
        if c.runner.GetAnsibleRunPolicy().Name == "" {
            ansible.SetPolicyRun(cr, "ObserveAndDelete")
        }
        if meta.WasDeleted(cr) {
                      /********** If it is marked for Deletion check if "Deleting" Reason is present **********/
                      /********** If "Deleting" reason is present it means Delete() has been run,   **********/
                      /********** so we return  non-existence of Resource    **********/
            cond_del := cr.Status.ResourceStatus.GetCondition(xpv1.Deleting().Type)
            if cond_del.Reason == "Deleting" {
                return managed.ExternalObservation{ResourceExists: false}, nil
            }
            return managed.ExternalObservation{ResourceExists: true}, nil
        }
...
janorga commented 9 months ago

The solution doesn't behave well when the Ansible Run() at Delete() takes a long time. Some delete requests are enqueued with original conditions from the managed resource. Thus, it is not able to read "Deleting" reason at those enqueued delete requests.

So maybe a good solution is to manage the Delete() call inside Observe() itself and previously check if the resource is still on Kubernetes. So we check if it is marked for Delete, then we search it in Kubernetes, if found, delete it in Observe.

func (c *external) Observe(ctx context.Context, mg resource.Managed) (managed.ExternalObservation, error) {
    cr, ok := mg.(*v1alpha1.AnsibleRun)
    if !ok {
        return managed.ExternalObservation{}, errors.New(errNotAnsibleRun)
    }
    cr.SetDeletionPolicy(xpv1.DeletionDelete)
....
    switch c.runner.GetAnsibleRunPolicy().Name {
    case "ObserveAndDelete", "":
        if c.runner.GetAnsibleRunPolicy().Name == "" {
            ansible.SetPolicyRun(cr, "ObserveAndDelete")
        }
        if meta.WasDeleted(cr) {
                        /* It is marked as delete and observe first if the resource is on Kubernetes */
            observed := cr.DeepCopy()
            if err := c.kube.Get(ctx, types.NamespacedName{
                Namespace: observed.GetNamespace(),
                Name:      observed.GetName(),
            }, observed); err != nil {
                if kerrors.IsNotFound(err) {
                    return managed.ExternalObservation{ResourceExists: false}, nil
                }
                return managed.ExternalObservation{}, fmt.Errorf("%s: %w", errGetAnsibleRun, err)
            }
                        /* If found, Delete it */
            err := c.Delete(ctx, mg)
            if err != nil {
                return managed.ExternalObservation{}, err
            }

            return managed.ExternalObservation{}, nil
        }
...