GalleyBytes / terraform-operator

A Kubernetes CRD to handle terraform operations
http://tf.galleybytes.com
Apache License 2.0
364 stars 47 forks source link

Unable to delete a deployment #5

Closed mbaumims closed 3 years ago

mbaumims commented 3 years ago

Hi,

First off, thanks for contributing the terraform operator. At this time we are currently evaluating using it as our main infrastructure provisioner. I have been able to use it successfully to create infrastructure, but am stuck on how to delete a deployment.

Some information. In my Terraform resource, I set these flags:

applyOnCreate: true
applyOnDelete: true
applyOnUpdate: true
ignoreDelete: false

I have waited more than 6 hours before deleting the terraform resource so that the pod that was created to deploy the infra has been deleted. What I have noticed is that the job has not been deleted. So when I delete the terraform resource, nothing happens and I see this in the log of the operator:

2021-02-08T11:56:04.460Z    INFO    controller_terraform    Checking if destroy task is done    {"Request.Namespace": "platform", "Request.Name": "iws-mbaum-test2"}                                            │
│ 2021-02-08T11:56:04.460Z    DEBUG    controller-runtime.manager.events    Normal    {"object": {"kind":"Terraform","namespace":"platform","name":"iws-mbaum-test2","uid":"bff70371-e0b6-4971-84e8-f1a55d48edbd" │
│ 2021-02-08T11:56:34.460Z    INFO    controller_terraform    Checking if destroy task is done    {"Request.Namespace": "platform", "Request.Name": "iws-mbaum-test2"}                                            │
│ 2021-02-08T11:57:04.463Z    INFO    controller_terraform    Checking if destroy task is done    {"Request.Namespace": "platform", "Request.Name": "iws-mbaum-test2"}

What I see earlier in the log is the following:

2021-02-08T11:56:04.444Z    INFO    controller_terraform    Secret iws-mbaum-test2-ssh-config already exists    {"Request.Namespace": "platform", "Request.Name": "iws-mbaum-test2"}                            │
│ 2021-02-08T11:56:04.460Z    INFO    controller_terraform    jobs.batch "iws-mbaum-test2" already exists    {"Request.Namespace": "platform", "Request.Name": "iws-mbaum-test2"} 

This seems to confirm my suspicion that the fact the create job still exists when I try to delete the resource causes the issue I observed.

Maybe I am missing something. Can you please advice on if I did not follow the right procedure for deleting a resource.

The operator just loops forever waiting for the destroy task to be done, but no destroy task has been started.

isaaguilar commented 3 years ago

HI @mbaumims thanks for reporting. I was able to replicate this quite easily. I'll take a look at this; it seems to be related to the changes between v0.3.0 and v0.3.1. The destroy feature still works in v0.3.0.

For a quick fix, you can revert the the container image back to v0.3.0 which should help complete the destroy process.

mbaumims commented 3 years ago

Hey @isaaguilar thanks for the quick feedback. I'll consider using the version you suggested. Just to understand, can I destroy infra at any time once it has been created, or does the pod and the job from the apply run need to be deleted first?

isaaguilar commented 3 years ago

Just to understand, can I destroy infra at any time once it has been created,

Yes, destroy can run anytime after Terraform has completed. This means that tfstate is saved to your backend. The "original" pod can still exist when you run destroy.

I highly recommend that you select a backend that does state locking so just in case a pod is trying to make changes, the lock will protect against tfstate corruption.

mbaumims commented 3 years ago

Okay, great. Yeah, we're using S3 as our backend.

isaaguilar commented 3 years ago

Fixed the bug in https://github.com/isaaguilar/terraform-operator/commit/e6aa74daf9e1e13272a55305eef19615d5874957 which is included in release v0.3.3.

Closing this ticket

mbaumims commented 3 years ago

Thanks for the quick turn around Isa. I tested your fix and it works well. One thing I noticed however, and it's not related to this bug per se, is that a plan is always run after an apply. This plan run overwrites the previously run plan, such that the original plan is no longer available. Not a big deal, but it would be nice to see the original plan.