Closed nwmahoney closed 4 years ago
Hey Nick!
QQs:
Occasionally, resources are still considered dependencies because the iaas can take a while to consider them deleted. This is more of a problem when you deleted something in the aws console like the vm, then tried to delete the network with leftovers because if you try to delete the vm with leftovers first -> leftovers will wait until the vm is truly marked as gone before trying to delete the vpc vs if you try with the console first, the status of the vm will be "deleting" and leftovers won't try to delete it itself and then proceed to the vpc.
We ran leftovers thrice in the pipeline (this build and the previous two), and once manually. Then we deleted in the console right after that and it worked. We hadn't done anything manual, and I think the pipeline just interacts with AWS through bbl and leftovers. I don't think there was any manual deletion.
P.S. Hi Gen!
Hi Nick!!!
Alright, what I'm gathering:
Since this is the first time this is happening, I'm wondering what conditions make this situation unique. Are there perhaps any new resource types that we are creating in the bbl aws terraform templates that leftovers doens't know to delete?
The aws api is returning a 400 saying there are dependencies, but I know that from the console, you are allowed to delete it regardless given those dependencies are certain types.
Maybe a question we can get an answer to is: can the aws api give us a better error message about what dependencies exist?
Alternatively, we could narrow down to what dependencies can be deleted by deleting the vpc in the console that you can't delete by deleting the vpc by using the api.
Looks like this is still happening. Can you post the output of aws ec2 describe-subnets --filters "Name=vpc-id,Values=vpc-0aa226bc62915c540"
? @nmahoney-pivotal
It's interesting that it always happens after a bosh failure like this...
That link doesn't work! 😔 But I think I can see the one you're referring to from earlier concourse logs.
This is beginning to remind me of the early bbl on OpenStack failures where incompletely creating a VM would result in a "port" (OpenStack static private IP allocation) being left behind without a VM attached to it. I wonder if there was a recent AWS CPI change that created a similar kind of shadow resource that doesn't get cleaned up by leftovers, and if so what that would be.
EDIT: on looking at the logs again I actually can't tell if BOSH even tried to create a VM. So maybe this whole hypothesis is entirely off-base and there's actually a Terraform problem.
@rowanjacobs I thought I sent this before... I guess not. Here's that output:
$ aws ec2 describe-subnets --filters "Name=vpc-id,Values=vpc-0aa226bc62915c540"
{
"Subnets": []
}
Closing this issue until (if) we see it again and can access the environment to debug.
It appears that occasionally there are load-balancers deployed to a network that do not contain the full filtering string. For instance, bbl sometimes crops the environment name when creating resources to fit the length limit when needed. https://github.com/cloudfoundry/bosh-bootloader/blob/a1f38c83bd02f71bab4dea46ce4cae86336969ff/terraform/aws/templates/concourse_lb.tf#L56
Leftovers doesn't return them in the list because they do not contain the full filter string like bump-deployments-aws-concourse
since the load balancer might be created with just bump-deplo
as prefix.
Working to see if we can use the vpcId
or subnetId
to delete these resources.
Deleting this VPC worked just fine in the AWS console using the same account.