genevieve / leftovers

Go cli & library for cleaning up orphaned IAAS resources.
Apache License 2.0
150 stars 22 forks source link

Should we delete resources that belong to an environment even if they don't contain the environment name? #91

Closed genevieve closed 4 years ago

genevieve commented 5 years ago

Example:

+ leftovers --filter basalt-spear --no-confirm -i gcp 
[Network: basalt-spear-pcf-network] Deleting...
[Network: basalt-spear-pcf-network] Delete: Operation error: The network resource 'projects/environments/global/networks/basalt-spear-pcf-network' is already being used by 'projects/cf-pks-releng-environments/global/firewalls/k8s-fw-a9c'

1 error occurred:
    * [Network: basalt-spear-pcf-network] Delete: Operation error: The network resource 'projects/environments/global/networks/basalt-spear-pcf-network' is already being used by 'projects/cf-pks-releng-environments/global/firewalls/k8s-fw-a9c'

Questions:

rowanjacobs commented 5 years ago

Saw a similar issue with forwarding rules recently:

Teardown error message:
[TeardownGcpEnvironmentJob:81f0d0d4-fb63-4bbc-ac92-7e67b8e41ce7 - glendora - Try #1]
resourceInUseByAnotherResource: The subnetwork resource 'projects/cf-pks-golf/regions/us-central1/subnetworks/glendora-services-subnet' is already being used by 'projects/cf-pks-golf/regions/us-central1/forwardingRules/a9e5d09468d3c11e9929642010a00080'
mjj209 commented 5 years ago

I was thinking about these resources today, as Toolsmiths are working on creating a new tear down workflow to be able to destroy cf-deployment environments.

For PCF environments, we've taken a varied approach to these additional resources. For some resources, we've deemed it's safe to delete them if they are at all attached to the environment in question. These objects are:

Having these objects bolted on to an environment outside of terraform is a common use case for a Custom Toolsmiths environment. Our users expect these resources to get deleted when they destroy their environment, even if the resources do not have the environment name in them. I believe the resources above are safe to delete if they are attached to the network, and do not contain the environment name.

It's a loosing battle adding each GCP resources separately to our list of objects we detect and delete. And so, we've taken a new approach for "other" GCP resources not listed above. We will try to run leftovers, which has been a HUGE benefit. And if leftovers still cannot complete successfully, then we tell the user what resources we believe are left, and we ask the users to decide what to do with these "other" GCP resources.

My favorite "other" GCP resource right now is targetHttpsProxies. If this is added to an environment, and it doesn't happen to have the environment name in the GCP Object name, then leftovers will fail to destroy the environment. I'm not sure how many "other" GCP resources exist, but it feels like GCP is adding new resource types all the time.

genevieve commented 4 years ago

@mjj209

I would like to deploy a toolsmiths environment and deploy the likely products that would lead leftovers to fail when trying to clean it all up. Do you have a particular environment configuration you can recommend?

mjj209 commented 4 years ago

@genevieve The most recent 2 failures were from users adding an additional router to their environment.

We could not delete the: projects//global/networks/ventura-pcf-network

Because this object existed: projects//regions/us-central1/routers/test-cloud-router

Toolsmiths now have logic that sends an email to the user saying we couldn't delete the network because the other resource exists, and leave 'what to do' up to the user. We don't really hit a lot of these issues, outside of the firewall issue that I believe is already merged into master.

I don't know why the users were adding routers, but this was the OD PKS team testing something.

genevieve commented 4 years ago

Closing this and merging with #80.