VPC fails to be destroyed as it seems not empty, but it should be

liarco commented 4 years ago

Terraform Version

Terraform v0.13.2
+ provider registry.terraform.io/digitalocean/digitalocean v1.22.2

Affected Resource

digitalocean_vpc

Terraform Configuration Files

https://gist.github.com/liarco/a1fab103af843feff29903feec2b27a9

Terraform Cloud run output

https://gist.github.com/liarco/6b2990a7212968730617c64b392630ec

Expected Behavior

All resources should be deleted successfully.

Actual Behavior

The VPC cannot be deleted since it seems to be "not empty", but running terraform destroy again destroys the VPC with no error.

Steps to Reproduce

terraform apply wait for all resources to be provisioned...
terraform destroy wait until it fails...
terraform destroy run again to destroy the VPC successfully

Important Factoids

You must have a default VPC for the region (e.g. ams3-default) before reproducing the error, otherwise a default VPC will be created and that one cannot be deleted using terraform. This is why I don't think that my issue can be related to #472.

andrewsomething commented 4 years ago

It seems like Kubernetes clusters are not de-registering from the VPC immediately when deleted. I've raised this with our VPC team for investigation.

Currently the VPC resource will retry the delete call for 2 minutes before timing out, but we made that configurable. Something like this should work around the issue:

resource "digitalocean_vpc" "vpc" {
  name     = "test-network"
  region   = "ams3"

  timeouts {
    delete = "10m"
  }
}

We might want to raise the default here regardless.

liarco commented 4 years ago

I'm sorry for the delay, but I can confirm that a higher timeout can be a good temporary workaround, thank you! Anyway it forces longer waits for the operations to be really performed, so a proper fix from the DO team might be the best solution.

aybabtme commented 3 years ago

I've been seeing the same thing, VPCs fail to delete despite having removed all resources that were in them. Re-running the deletion a few minutes later ends up working (although in the past I had a case where a VPC would simply refuse to be deleted for weeks).

Would be nice if this was fixed in the API. @adamwg do you know anything about this (😸 🚎)?

adamwg commented 3 years ago

I'm guessing this is due to the fact that DOKS cluster deletion is asynchronous. The cluster is marked as deleted immediately, but we delete the worker node droplets after returning. Since the worker node droplets are what's actually attached to the VPC, the VPC isn't empty until they're deleted. Usually the droplets will be deleted within the 2 minute retry window, but due to various problems they could stick around longer.

aybabtme commented 3 years ago

Interesting, I guess from a user's perspective it would be good to get an error from VPC explaining that some resources are still being deleted, so at least we'd know what's happening and don't think it's just broken 😅

h3ct0rjs commented 3 years ago

Hi everybody,

It seems that the same problem is happening to me, I'm doing simple exercises to create a small infrastructure using Digital Ocean. When I create a simple vpc in a specific region where we previously don't have a default vpc, the provider sets this vpc created with terraform as default, that is a normal behavior, the problem is that if you don't have a previous vpc as default if you run terraform destroy it is unable to destroy the vpc resource because it's a default vpc.

Let's take for example the following code.

resource "digitalocean_vpc" "web_vpc"{
    # The human friendly name of our VPC.
    name = var.vpcname

    # The region to deploy my VPC.
    region = var.region

    # The private ip range within our VPC
    ip_range = "10.148.0.0/22"
}

adding the timeout fix the issue for me, I want to know if you're working to fix this or check other workarounds in my pipeline.

Thanks in advance, Sincerely, H

dev-ago commented 2 years ago

We have the same problem, we just built a "workaround" and created a default VPC in the region by hand or with a different Terraform code base.

So when your infrastructure is deleted you don't get the problem.

For the error message (another bug with the Digital Ocean API):

Error: DELETE https://api.digitalocean.com/v2/vpcs/77c8159d-a141-4837-b358-145953f64fb0: 403 (request "7ae089e4-65cb-410b-9cf2-3cc9abdfb8cc") Can not delete VPC with member".

Have we built a workaroung with the Terraform time_sleep provider and looks like this:

vpc.tf

resource "digitalocean_vpc" "sandbox" {
  name = "sandbox"
  region = var.region
  ip_range = var.ip_range
}
resource "time_sleep" "wait_300_seconds_to_destroy" {
  depends_on = [digitalocean_vpc.sandbox]
  destroy_duration = "300s"
}
resource "null_resource" "placeholder" {
  depends_on = [time_sleep.wait_300_seconds_to_destroy]
}

Destroy logs

Plan: 0 to add, 0 to change, 15 to destroy.
local_file.kubeconfig: Destroying... [id=9bf91e94b81d32307923fe968195fdc15ce8c255]
null_resource.placeholdet: Destroying... [id=34435[613](https://gitlab.com/ccsolutions.io/do/terraform/kubernetes/-/jobs/2253340766#L613)04797448323]
local_file.kubeconfig: Destruction complete after 0s
null_resource.placeholdet: Destruction complete after 0s
time_sleep.wait_300_seconds_to_destroy: Destroying... [id=2022-03-25T22:24:52Z]
kubernetes_secret_v1.cert_manager: Destroying... [id=cert-manager/cf-token]
helm_release.nginx_ingress_controller: Destroying... [id=nginx-ingress]
helm_release.extermal-dns: Destroying... [id=external-dns]
helm_release.cert-manager: Destroying... [id=cert-manager]
kubernetes_secret_v1.cert_manager: Destruction complete after 0s
helm_release.extermal-dns: Destruction complete after 4s
helm_release.nginx_ingress_controller: Destruction complete after 6s
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 10s elapsed]
helm_release.cert-manager: Still destroying... [id=cert-manager, 10s elapsed]
helm_release.cert-manager: Destruction complete after 10s
kubernetes_namespace_v1.core_namespaces["cert-manager"]: Destroying... [id=cert-manager]
kubernetes_namespace_v1.core_namespaces["infra"]: Destroying... [id=infra]
kubernetes_namespace_v1.core_namespaces["monitoring"]: Destroying... [id=monitoring]
kubernetes_namespace_v1.core_namespaces["external-dns"]: Destroying... [id=external-dns]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Destroying... [id=ingress-nginx]
kubernetes_namespace_v1.core_namespaces["gitlab-runner"]: Destroying... [id=gitlab-runner]
kubernetes_namespace_v1.core_namespaces["gitlab-runner"]: Destruction complete after 7s
kubernetes_namespace_v1.core_namespaces["infra"]: Destruction complete after 7s
kubernetes_namespace_v1.core_namespaces["cert-manager"]: Destruction complete after 7s
kubernetes_namespace_v1.core_namespaces["external-dns"]: Destruction complete after 7s
kubernetes_namespace_v1.core_namespaces["monitoring"]: Destruction complete after 7s
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 20s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 10s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 30s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 20s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 40s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 30s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 50s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 40s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 1m0s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 50s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 1m10s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 1m0s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 1m20s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 1m10s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 1m30s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 1m20s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 1m40s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 1m30s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 1m50s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 1m40s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 2m0s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 1m50s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 2m10s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 2m0s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 2m20s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 2m10s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 2m30s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 2m20s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 2m40s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 2m30s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 2m50s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 2m40s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Destruction complete after 2m46s
digitalocean_kubernetes_cluster.sandbox: Destroying... [id=6f85adb6-2dad-4836-87f8-d6c92e9287d6]
digitalocean_kubernetes_cluster.sandbox: Destruction complete after 0s
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 3m0s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 3m10s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 3m20s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 3m30s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 3m40s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 3m50s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 4m0s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 4m10s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 4m20s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 4m30s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 4m40s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 4m50s elapsed]
time_sleep.wait_300_seconds_to_destroy: Destruction complete after 5m0s
digitalocean_vpc.sandbox: Destroying... [id=40d36e5b-518b-4488-9fa7-d4124929081e]
digitalocean_vpc.sandbox: Still destroying... [id=40d36e5b-518b-4488-9fa7-d4124929081e, 10s elapsed]
digitalocean_vpc.sandbox: Destruction complete after 17s
Destroy complete! Resources: 15 destroyed.

moreinhardt commented 1 month ago

It seems like Kubernetes clusters are not de-registering from the VPC immediately when deleted. I've raised this with our VPC team for investigation.

Currently the VPC resource will retry the delete call for 2 minutes before timing out, but we made that configurable. Something like this should work around the issue:
resource "digitalocean_vpc" "vpc" {
  name     = "test-network"
  region   = "ams3"

  timeouts {
    delete = "10m"
  }
}
We might want to raise the default here regardless.

In my case it's also happening when the only resource in the VPC is a single droplet (I have added it to a project too and there are also some DO tags created).

The delete timeout workaround seems to be ignored. It doesn't change anything for me.

Terraform v1.9.4
on linux_amd64
+ provider registry.terraform.io/digitalocean/digitalocean v2.39.2

digitalocean / terraform-provider-digitalocean