genevieve / leftovers

Go cli & library for cleaning up orphaned IAAS resources.
Apache License 2.0
150 stars 22 forks source link

GCP - cannot remove service account permissions when using a service account key with only "editor" permissions #90

Closed rowanjacobs closed 5 years ago

rowanjacobs commented 5 years ago

The change introduced in issue #37 has caused some issues tearing down CF Toolsmiths environments.

Some background info: our environments app can create what we call "custom GCP environments", which are PAS, SF-PAS, or PKS environments deployed by our app into a(n internal) customer's GCP project. In order to do this, our customers provide us with a GCP service account key. We specify (and validate in our app) that this key have editor permissions, but we do not require owner permissions.

Before April 25th, 2019, we created an Ops Manager service account in our customers' GCP projects, and attached the required Ops Manager permissions to it. We have since disabled the creation of new service accounts, but we still have a large number of environments lingering from before April 25th (approximately half of our custom GCP environments are these long-running environments). These environments may have user-provided service account keys with only editor permissions, which cannot delete the Ops Manager service accounts using versions of leftovers after v0.42.0.

As a result, we're a bit stuck—we want to upgrade to the latest version of leftovers to solve issue #89, but doing so would cause failures when tearing down these older environments.

After brainstorming with the Toolsmiths PM (@mjj209) we thought of a few options, of which these two are the most attractive:

Option 2 is my personal preferred approach, although it requires some changes to leftovers and might be actively counterproductive for other teams' use cases. Before I go ahead and attempt to implement it, I'd like some thoughts about whether that's the right approach and if there are any other major approaches that I've missed.

genevieve commented 5 years ago

I like option 2 but would this problem also be solved by the ignore/skip flag we’ve talked about?

genevieve commented 5 years ago

We could add an MVP of the skip flag for gcp iam resources and extend it to other resources as the demand arises.

rowanjacobs commented 5 years ago

That's even better. I think it would solve our problem while also not introducing a special case for handling permissions failures (which could cause all kinds of questions of the form "why do you allow users to try to delete this resource without permissions but not that one?").

genevieve commented 5 years ago

Cool. So we'll add the --skip or --ignore flag to leftovers and in the README just indicate that this is currently only available gcp iam resources (which at this time is only service accounts.) Seems like a pretty simple, straightforward change and allows us to iterate! Love it.

rowanjacobs commented 5 years ago

We circled back to this issue in IPM today, where the rest of the Toolsmiths team had a few questions/comments/concerns. Specifically, when we delete PKS environments, we definitely need to delete service account permissions. Currently, we use the exact same deletion logic for PAS and PKS. Introducing a --skip or --ignore flag would also introduce a difference between the deletion code for these two kinds of environments. So we were thinking of going with option 1 (from my original post) and tabling --skip/--ignore until we really need it. Alternatively, someone had the idea that we could add a flag to optionally ignore permissions issues.

genevieve commented 5 years ago

If you're comfortable with option 1 for now, that's fine by me.

Happy to continue to explore --skip anyway for future uses.

genevieve commented 5 years ago

Do you want to close this issue and keep the --skip flag issue as the source of truth? #68

rowanjacobs commented 5 years ago

Sounds good.