GalleyBytes / terraform-operator

A Kubernetes CRD to handle terraform operations
http://tf.galleybytes.com
Apache License 2.0
366 stars 47 forks source link

Stuck deleting #104

Closed OmpahDev closed 2 years ago

OmpahDev commented 2 years ago

I tried creating a terraform object, but it errored out and when I tried deleting it, the delete just hung for ages until my CLI timed out, and when I try to create it again it warns me that I can't as a resource with the same name is being deleted.

How do I get it out of this eternal deletion limbo and just force-delete it?

Specifically, what seems to have happened is I tried specifying a custom backend and it failed because it wasn't able to write to that backend. So there's no state file locking or anything like that at play because it wasn't able to create state files at all.

It seems whenever a TF is deleted, it tries spinning up an "init-delete" container - however that container errors out. Is this why it's stuck?

isaaguilar commented 2 years ago

Hi @tdevopsottawa The resource that is being deleted probably has a "finalizer" in the resource. Here's what that means and how to remove it.

What is means

The finalizer is added automatically when if the user does not add ignoreDelete=true to the resource spec. This means that the user wants TFO to clean up any object created with TFO when the resource is removed.

How to fix it

Since TFO is failing to finish the "cleanup" for whatever reason, you can manually edit the TFO object by running the following command:

kubectl patch tf <tfo-resource-name> -p '{"metadata: {"finalizers": null}}'

The TFO resource should be gone after that.

OmpahDev commented 2 years ago

Thanks @isaaguilar.. command doesn't work:

$ kubectl patch --namespace="terraform-operator" tf hello-tfo -p '{"metadata: {"finalizers": null}}'
Error from server (UnsupportedMediaType): the body of the request was in an unknown format - accepted media types include: application/json-patch+json, application/merge-pa
tch+json, application/apply-patch+yaml

It seems that this undeletable problem happens whenever the Terraform Init step fails to run for whatever reason; because the init container failedtheinit-delete` container will fail too and then the TFO resource and the failed pods are stuck on the cluster forever unless you manually delete the finalizers.

Is there a plan to fix this?

isaaguilar commented 2 years ago

Its a known issue(?). When I run into the init failure, it's usually my config that's the problem. I have to tweak the config, or make sure I have good credentials to pull any repos or access any backends.

But to your point, initially failing init will most certainly cause init-delete failures. It's almost inevitable since TFO has to make some sort of assumption that the resource it's supposed to "manage" is supposed to be removed if the k8s resource is removed. The easiest way to make this not happen is to use ignoreDelete.

...
kind: terraform
spec:
  ...
  ignoreDelete: true

This will not kick off the delete workflow.

isaaguilar commented 2 years ago

I wonder if the patch command is outdated. I pulled it from https://www.howtogeek.com/devops/what-are-finalizers-in-kubernetes-how-to-handle-object-deletions/#:~:text=You%20can%20manually%20remove%20an%20object%E2%80%99s%20Finalizers%20by,lead%20to%20orphaned%20objects%20and%20broken%20dependency%20chains.

A way to do the same thing is to do a kubectl edit and manually remove the finalizer. I hope this helps.

OmpahDev commented 2 years ago

A way to do the same thing is to do a kubectl edit and manually remove the finalizer. I hope this helps.

Was just about to edit and comment that I managed to do it this way