Open schue opened 3 years ago
I did try editing the config map and changing action to "abort" makes the runner shut down but switching back to "apply" doesn't seem to do anything.
Helmfile is a great tool, I'm not sure it's related to the problem. I'm going to assume this would also happen when running helm delete
on the terraform resource as well.
As far as the behavior you described, I'd like to clarify to get a clear picture. So...
helmfile apply
Questions for you:
terraform
resource in k8s get deleted?applyOnCreate
, applyOnUpdate
, applyOnDelete
, and ignoreDelete
options set as?Also, what version of tfo are you running?
It does run a destroy pod and it runs to completion and stays there as "complete". Destroying leaves the runner around and doing a "kubectl delete pod" respawns it without a new config. The apply settings are:
applyOnCreate: true
applyOnUpdate: true
applyOnDelete: true
ignoreDelete: false
This on K3D and K3S on Linux by the way.
Does kubectl get terraform
still have the resource there?
Just in case you missed my previous comment, what version of terraform-operator are you running?
After a destroy? It does not immediately go away but does after a bit. The pod remains running.
I'm running latest 0.3.8.
My terraform file originally used SSH agent but now I'm migrating it to use private key secrets and a lot of my sessions end with a stuck terraform waiting for an SSH agent that isn't going to show up. Does your tool rely on terraform runs always finishing?
I finished updating my terraform to not use local subdirectories. I can confirm that things are much more well behaved if apply runs to completion. Trying to destroy a stuck terraform seems problematic.
Thank you for keeping this ticket updated with your findings. I appreciate the feedback.
Does your tool rely on terraform runs always finishing?
There is an "soft" order of running apply and delete. In general, it is important to wait until the terraform apply completes before running destroy. This is especially important when the user is not using a “locking” backend.
If the user is using a “locking” backend, when the user deletes the terraform resource while apply is still running, the terraform destroy
command will be blocked from running until it can obtain the backend lock. In essence, the terraform apply
must release the lock. It is not up to this project, terraform-operator
, to handle locks.
Now there is a caveat that both terraform apply
and terraform destroy
only retry 10 times to get started. 10 is an arbitrary number in run.sh
. I’m wondering if the destroy pod gets stuck in this loop in your case. In any case, I don’t think that the “destroy” pod should remain running so I will look into that.
I'm using your tool with https://github.com/roboll/helmfile and can successfully start up my terraform from a GIT repo. When I try to update or destroy, however, I seem to be having some problems. It doesn't seem to check GIT sources for updates if I do an apply and the runner pod seems to stay around if I do a destroy. I'm running the service in its own namespace and trying to delete that seems to hang in a "terminating" state. The result is that I basically have to recreate the cluster to get it to see updates. Any thoughts?