Open liarco opened 4 years ago
It seems like Kubernetes clusters are not de-registering from the VPC immediately when deleted. I've raised this with our VPC team for investigation.
Currently the VPC resource will retry the delete call for 2 minutes before timing out, but we made that configurable. Something like this should work around the issue:
resource "digitalocean_vpc" "vpc" {
name = "test-network"
region = "ams3"
timeouts {
delete = "10m"
}
}
We might want to raise the default here regardless.
I'm sorry for the delay, but I can confirm that a higher timeout
can be a good temporary workaround, thank you!
Anyway it forces longer waits for the operations to be really performed, so a proper fix from the DO team might be the best solution.
I've been seeing the same thing, VPCs fail to delete despite having removed all resources that were in them. Re-running the deletion a few minutes later ends up working (although in the past I had a case where a VPC would simply refuse to be deleted for weeks).
Would be nice if this was fixed in the API. @adamwg do you know anything about this (😸 🚎)?
I'm guessing this is due to the fact that DOKS cluster deletion is asynchronous. The cluster is marked as deleted immediately, but we delete the worker node droplets after returning. Since the worker node droplets are what's actually attached to the VPC, the VPC isn't empty until they're deleted. Usually the droplets will be deleted within the 2 minute retry window, but due to various problems they could stick around longer.
Interesting, I guess from a user's perspective it would be good to get an error from VPC explaining that some resources are still being deleted, so at least we'd know what's happening and don't think it's just broken 😅
Hi everybody,
It seems that the same problem is happening to me, I'm doing simple exercises to create a small infrastructure using Digital Ocean. When I create a simple vpc in a specific region where we previously don't have a default vpc, the provider sets this vpc created with terraform as default, that is a normal behavior, the problem is that if you don't have a previous vpc as default if you run
terraform destroy
it is unable to destroy the vpc resource because it's a default vpc.
Let's take for example the following code.
resource "digitalocean_vpc" "web_vpc"{
# The human friendly name of our VPC.
name = var.vpcname
# The region to deploy my VPC.
region = var.region
# The private ip range within our VPC
ip_range = "10.148.0.0/22"
}
adding the timeout fix the issue for me, I want to know if you're working to fix this or check other workarounds in my pipeline.
Thanks in advance, Sincerely, H
We have the same problem, we just built a "workaround" and created a default VPC in the region by hand or with a different Terraform code base.
So when your infrastructure is deleted you don't get the problem.
For the error message (another bug with the Digital Ocean API):
Error: DELETE https://api.digitalocean.com/v2/vpcs/77c8159d-a141-4837-b358-145953f64fb0: 403 (request "7ae089e4-65cb-410b-9cf2-3cc9abdfb8cc") Can not delete VPC with member".
Have we built a workaroung with the Terraform time_sleep provider and looks like this:
resource "digitalocean_vpc" "sandbox" {
name = "sandbox"
region = var.region
ip_range = var.ip_range
}
resource "time_sleep" "wait_300_seconds_to_destroy" {
depends_on = [digitalocean_vpc.sandbox]
destroy_duration = "300s"
}
resource "null_resource" "placeholder" {
depends_on = [time_sleep.wait_300_seconds_to_destroy]
}
Plan: 0 to add, 0 to change, 15 to destroy.
local_file.kubeconfig: Destroying... [id=9bf91e94b81d32307923fe968195fdc15ce8c255]
null_resource.placeholdet: Destroying... [id=34435[613](https://gitlab.com/ccsolutions.io/do/terraform/kubernetes/-/jobs/2253340766#L613)04797448323]
local_file.kubeconfig: Destruction complete after 0s
null_resource.placeholdet: Destruction complete after 0s
time_sleep.wait_300_seconds_to_destroy: Destroying... [id=2022-03-25T22:24:52Z]
kubernetes_secret_v1.cert_manager: Destroying... [id=cert-manager/cf-token]
helm_release.nginx_ingress_controller: Destroying... [id=nginx-ingress]
helm_release.extermal-dns: Destroying... [id=external-dns]
helm_release.cert-manager: Destroying... [id=cert-manager]
kubernetes_secret_v1.cert_manager: Destruction complete after 0s
helm_release.extermal-dns: Destruction complete after 4s
helm_release.nginx_ingress_controller: Destruction complete after 6s
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 10s elapsed]
helm_release.cert-manager: Still destroying... [id=cert-manager, 10s elapsed]
helm_release.cert-manager: Destruction complete after 10s
kubernetes_namespace_v1.core_namespaces["cert-manager"]: Destroying... [id=cert-manager]
kubernetes_namespace_v1.core_namespaces["infra"]: Destroying... [id=infra]
kubernetes_namespace_v1.core_namespaces["monitoring"]: Destroying... [id=monitoring]
kubernetes_namespace_v1.core_namespaces["external-dns"]: Destroying... [id=external-dns]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Destroying... [id=ingress-nginx]
kubernetes_namespace_v1.core_namespaces["gitlab-runner"]: Destroying... [id=gitlab-runner]
kubernetes_namespace_v1.core_namespaces["gitlab-runner"]: Destruction complete after 7s
kubernetes_namespace_v1.core_namespaces["infra"]: Destruction complete after 7s
kubernetes_namespace_v1.core_namespaces["cert-manager"]: Destruction complete after 7s
kubernetes_namespace_v1.core_namespaces["external-dns"]: Destruction complete after 7s
kubernetes_namespace_v1.core_namespaces["monitoring"]: Destruction complete after 7s
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 20s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 10s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 30s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 20s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 40s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 30s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 50s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 40s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 1m0s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 50s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 1m10s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 1m0s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 1m20s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 1m10s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 1m30s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 1m20s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 1m40s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 1m30s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 1m50s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 1m40s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 2m0s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 1m50s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 2m10s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 2m0s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 2m20s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 2m10s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 2m30s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 2m20s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 2m40s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 2m30s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 2m50s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Still destroying... [id=ingress-nginx, 2m40s elapsed]
kubernetes_namespace_v1.core_namespaces["ingress-nginx"]: Destruction complete after 2m46s
digitalocean_kubernetes_cluster.sandbox: Destroying... [id=6f85adb6-2dad-4836-87f8-d6c92e9287d6]
digitalocean_kubernetes_cluster.sandbox: Destruction complete after 0s
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 3m0s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 3m10s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 3m20s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 3m30s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 3m40s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 3m50s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 4m0s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 4m10s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 4m20s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 4m30s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 4m40s elapsed]
time_sleep.wait_300_seconds_to_destroy: Still destroying... [id=2022-03-25T22:24:52Z, 4m50s elapsed]
time_sleep.wait_300_seconds_to_destroy: Destruction complete after 5m0s
digitalocean_vpc.sandbox: Destroying... [id=40d36e5b-518b-4488-9fa7-d4124929081e]
digitalocean_vpc.sandbox: Still destroying... [id=40d36e5b-518b-4488-9fa7-d4124929081e, 10s elapsed]
digitalocean_vpc.sandbox: Destruction complete after 17s
Destroy complete! Resources: 15 destroyed.
It seems like Kubernetes clusters are not de-registering from the VPC immediately when deleted. I've raised this with our VPC team for investigation.
Currently the VPC resource will retry the delete call for 2 minutes before timing out, but we made that configurable. Something like this should work around the issue:
resource "digitalocean_vpc" "vpc" { name = "test-network" region = "ams3" timeouts { delete = "10m" } }
We might want to raise the default here regardless.
In my case it's also happening when the only resource in the VPC is a single droplet (I have added it to a project too and there are also some DO tags created).
The delete timeout workaround seems to be ignored. It doesn't change anything for me.
Terraform v1.9.4
on linux_amd64
+ provider registry.terraform.io/digitalocean/digitalocean v2.39.2
Terraform Version
Affected Resource
Terraform Configuration Files
https://gist.github.com/liarco/a1fab103af843feff29903feec2b27a9
Terraform Cloud run output
https://gist.github.com/liarco/6b2990a7212968730617c64b392630ec
Expected Behavior
All resources should be deleted successfully.
Actual Behavior
The VPC cannot be deleted since it seems to be "not empty", but running
terraform destroy
again destroys the VPC with no error.Steps to Reproduce
terraform apply
wait for all resources to be provisioned...terraform destroy
wait until it fails...terraform destroy
run again to destroy the VPC successfullyImportant Factoids
You must have a default VPC for the region (e.g.
ams3-default
) before reproducing the error, otherwise a default VPC will be created and that one cannot be deleted using terraform. This is why I don't think that my issue can be related to #472.