GalleyBytes / terraform-operator

A Kubernetes CRD to handle terraform operations
http://tf.galleybytes.com
Apache License 2.0
364 stars 47 forks source link

Difficulty destroying #35

Open schue opened 3 years ago

schue commented 3 years ago

I'm using your tool with https://github.com/roboll/helmfile and can successfully start up my terraform from a GIT repo. When I try to update or destroy, however, I seem to be having some problems. It doesn't seem to check GIT sources for updates if I do an apply and the runner pod seems to stay around if I do a destroy. I'm running the service in its own namespace and trying to delete that seems to hang in a "terminating" state. The result is that I basically have to recreate the cluster to get it to see updates. Any thoughts?

schue commented 3 years ago

I did try editing the config map and changing action to "abort" makes the runner shut down but switching back to "apply" doesn't seem to do anything.

isaaguilar commented 3 years ago

Helmfile is a great tool, I'm not sure it's related to the problem. I'm going to assume this would also happen when running helm delete on the terraform resource as well.

As far as the behavior you described, I'd like to clarify to get a clear picture. So...

  1. is helmfile apply
  2. a terraform-apply job/pod gets triggered
  3. the "apply" pod is still running when you run a destroy

Questions for you:

  1. Does a destroy pod start?
  2. Does the terraform resource in k8s get deleted?
  3. i should have started with this question, what are the applyOnCreate, applyOnUpdate, applyOnDelete, and ignoreDelete options set as?
isaaguilar commented 3 years ago

Also, what version of tfo are you running?

schue commented 3 years ago

It does run a destroy pod and it runs to completion and stays there as "complete". Destroying leaves the runner around and doing a "kubectl delete pod" respawns it without a new config. The apply settings are:

applyOnCreate: true
applyOnUpdate: true
applyOnDelete: true
ignoreDelete: false

schue commented 3 years ago

This on K3D and K3S on Linux by the way.

isaaguilar commented 3 years ago

Does kubectl get terraform still have the resource there?

Just in case you missed my previous comment, what version of terraform-operator are you running?

schue commented 3 years ago

After a destroy? It does not immediately go away but does after a bit. The pod remains running.

I'm running latest 0.3.8.

My terraform file originally used SSH agent but now I'm migrating it to use private key secrets and a lot of my sessions end with a stuck terraform waiting for an SSH agent that isn't going to show up. Does your tool rely on terraform runs always finishing?

schue commented 3 years ago

I finished updating my terraform to not use local subdirectories. I can confirm that things are much more well behaved if apply runs to completion. Trying to destroy a stuck terraform seems problematic.

isaaguilar commented 3 years ago

Thank you for keeping this ticket updated with your findings. I appreciate the feedback.

Does your tool rely on terraform runs always finishing?

There is an "soft" order of running apply and delete. In general, it is important to wait until the terraform apply completes before running destroy. This is especially important when the user is not using a “locking” backend.

If the user is using a “locking” backend, when the user deletes the terraform resource while apply is still running, the terraform destroy command will be blocked from running until it can obtain the backend lock. In essence, the terraform apply must release the lock. It is not up to this project, terraform-operator, to handle locks.

Now there is a caveat that both terraform apply and terraform destroy only retry 10 times to get started. 10 is an arbitrary number in run.sh. I’m wondering if the destroy pod gets stuck in this loop in your case. In any case, I don’t think that the “destroy” pod should remain running so I will look into that.