digitalocean / terraform-provider-digitalocean

Terraform DigitalOcean provider
https://registry.terraform.io/providers/digitalocean/digitalocean/latest/docs
Mozilla Public License 2.0
508 stars 277 forks source link

bug: zombie Kubernetes cluster #778

Open bweston92 opened 2 years ago

bweston92 commented 2 years ago

Describe the bug

If the Kubernetes cluster fails during creation, further attempts to "apply" changes will fail as the cluster name already exists, but if you do a destroy it doesn't remove the cluster so you're with a zombie cluster.

Affected Resource(s)

Actual Behavior

During the creation of a Kubernetes cluster it fails for an unknown reason.

{"@level":"info","@message":"digitalocean_kubernetes_cluster.cluster: Still creating... [6m40s elapsed]","@module":"terraform.ui","@timestamp":"2022-01-17T09:49:26.752118Z","hook":{"resource":{"addr":"digitalocean_kubernetes_cluster.cluster","module":"","resource":"digitalocean_kubernetes_cluster.cluster","implied_provider":"digitalocean","resource_type":"digitalocean_kubernetes_cluster","resource_name":"cluster","resource_key":null},"action":"create","elapsed_seconds":400},"type":"apply_progress"}
{"@level":"info","@message":"digitalocean_kubernetes_cluster.cluster: Creation errored after 6m42s","@module":"terraform.ui","@timestamp":"2022-01-17T09:49:28.847514Z","hook":{"resource":{"addr":"digitalocean_kubernetes_cluster.cluster","module":"","resource":"digitalocean_kubernetes_cluster.cluster","implied_provider":"digitalocean","resource_type":"digitalocean_kubernetes_cluster","resource_name":"cluster","resource_key":null},"action":"create","elapsed_seconds":402},"type":"apply_errored"}
{"@level":"error","@message":"Error: Error creating Kubernetes cluster: Error trying to read cluster state: GET https://api.digitalocean.com/v2/kubernetes/clusters/6606e3df-7767-4881-8684-2184ac8ad2ee: 500 (request \"3f228c7a-ddee-44b3-8016-481a396ea31a\") Server Error","@module":"terraform.ui","@timestamp":"2022-01-17T09:49:29.003188Z","diagnostic":{"severity":"error","summary":"Error creating Kubernetes cluster: Error trying to read cluster state: GET https://api.digitalocean.com/v2/kubernetes/clusters/6606e3df-7767-4881-8684-2184ac8ad2ee: 500 (request \"3f228c7a-ddee-44b3-8016-481a396ea31a\") Server Error","detail":"","address":"digitalocean_kubernetes_cluster.cluster","range":{"filename":"cluster.tf","start":{"line":1,"column":54,"byte":53},"end":{"line":1,"column":55,"byte":54}},"snippet":{"context":"resource \"digitalocean_kubernetes_cluster\" \"cluster\"","code":"resource \"digitalocean_kubernetes_cluster\" \"cluster\" {","start_line":1,"highlight_start_offset":53,"highlight_end_offset":54,"values":[]}},"type":"diagnostic"}

Subsequent attempt at applying the cluster gets a 422.

{"@level":"error","@message":"Error: Error creating Kubernetes cluster: POST https://api.digitalocean.com/v2/kubernetes/clusters: 422 (request \"f76a82f0-53a7-4cc7-b048-37ff7a974ca3\") a cluster with this name already exists","@module":"terraform.ui","@timestamp":"2022-01-17T09:54:21.949386Z","diagnostic":{"severity":"error","summary":"Error creating Kubernetes cluster: POST https://api.digitalocean.com/v2/kubernetes/clusters: 422 (request \"f76a82f0-53a7-4cc7-b048-37ff7a974ca3\") a cluster with this name already exists","detail":"","address":"digitalocean_kubernetes_cluster.cluster","range":{"filename":"cluster.tf","start":{"line":1,"column":54,"byte":53},"end":{"line":1,"column":55,"byte":54}},"snippet":{"context":"resource \"digitalocean_kubernetes_cluster\" \"cluster\"","code":"resource \"digitalocean_kubernetes_cluster\" \"cluster\" {","start_line":1,"highlight_start_offset":53,"highlight_end_offset":54,"values":[]}},"type":"diagnostic"}

however on a destroy it doesn't remove the cluster.

andrewsomething commented 2 years ago

Hi @bweston92,

It looks like there was an issue with the API while polling for cluster status post-create. You should be able to recover from this by importing the cluster:

terraform import digitalocean_kubernetes_cluster.<name> <cluster ID>
bweston92 commented 2 years ago

Hi, thanks for taking the time to reply.

Is there a way if it fails then the provider issues a delete and waits for it to delete?

Our apply and destroy calls doesn't have user intervention (scheduled jobs / test env pipeline) and the less bloat like that the better. We would have to add labels to the clusters and create something that could query the the clusters with them labels and import, not the ideal user experience.