aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.46k stars 3.83k forks source link

Garbage-collect orphaned stacks #13676

Open sandwi opened 3 years ago

sandwi commented 3 years ago

:question: General Issue

The Question

cdk deploy leaves stack deployed if a stack is removed from application post deployment and cdk app is deployed again.

Steps to Reproduce

  1. Create a CDK app with two stacks
  2. Run: cdk deploy --all
  3. Delete one stack from the CDK app
  4. Run: cdk deploy --all

Expected Results The deleted stack is gone.

Actual Results The deleted stack exists and has been "leaked".

i.e. the stacks are procedural rather than declarative. This doesn't fit into a managed pipeline model as stacks will accidentally leak and have to be manually cleaned, while the behavior makes sense if you're familiar with the implementation details of CDK, this isn't the right developer experience.

Environment

Other information

peterwoodworth commented 3 years ago

Hey @sandwi, thank you for the submission.

This is presently intended behavior, is this something you would like to see changed in the future?

sandwi commented 3 years ago

@peterwoodworth The use case is that of a managed pipeline-as-a-service where developers own the cdk app and push changes and pipeline executes the changes. The developer may delete a stack from CDK app (source code), making an assumption that CDK will be able to deduce (a) that stack(s) has/have been deleted from CDK app, hence in generated cloud assembly but a prior version of the CDK app had the deleted stack and is still deployed, (b) thus cdk deploy of this new version should delete the deployed stack(s) that are no longer part of the app. That is the intent of the developer. We understand that there are interesting scenarios with resource retention policy and how to reason about them. Perhaps we should provide guidelines and best practices around these scenarios. The stack deletion is something of value in development and test phases when developers are experimenting with new features. It is also of value in Blue/Green deployment. This is like versioned cdk app where stack(s) may change between versions. Git (or any vcs) has the version tags at source level. Its the cloud assembly versioning and deployed resources (by a version of assembly) pieces that need enhancements.

NGL321 commented 3 years ago

Hey @sandwi,

You make a solid case for the feature, but I'm not entirely certain about timeframe. I will ask around about prioritization, but if you really believe in this, best way to ensure it gets a more timely resolution is to increase engagement on this issue!

😸 😷

eladb commented 3 years ago

This is a pretty tricky challenge. When a stack is deleted from a CDK app, the app no longer knows that the stack ever existed and therefore it cannot delete.

The only way I could think this can be solved is by keeping some inventory in the environment of all CDK stacks (e.g. by tagging them) and then running some kind of garbage collection for each environment after deployment which will identify all registered stacks that are not accounted for.

Reassigning to @rix0rrr for effort estimation and prioritization.

ericzbeard commented 3 years ago

This could be another good use case for AppRegistry integration.

eladb commented 3 years ago

This could be another good use case for AppRegistry integration.

I am intrigued. What do you mean by that?

ericzbeard commented 3 years ago

We have discussed using AppRegistry as a way to store a stable "AppId" for CDK apps, but this use case is more expansive. If we register with the Service Catalog App Registry, we could reference the AppRegistry stack as a picture of the entire app as it exists before we do the next deployment.

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-servicecatalogappregistry-resourceassociation.html

If one of the stacks in an app is deleted from the code, it will show up in the diff for the AppRegistry stack, which we can detect during synthesis, and take action on after successful deployment of the modified app.

It also has the side benefit of giving us a "free" user interface for CDK applications in the console.

Something like:

const app = new cdk.App({
    appRegistry: {
        register: true, 
        deleteOrphanedStacks: true,
    },
});
tomas-mazak commented 2 years ago

Hi, I would like to revive this topic. In our company, we use CDK extensively and especially with CDK pipelines (we have 120+ pipelines ATM) the orphan stacks are becoming an issue.

The advantage of CDK pipeline is that once you instantiate it, you can forget about applying CDK changes - you simply push to git and changes are applied by a pipeline. However, if you add a pipeline "stage" and later remove it (or just rename it), the stage stacks will stay up and will be "orphaned".

We tackle this by assigning a set of tags to each stack that uniquely identify the CDK app. We provide a tool to application owners that synthesizes the app and gets the list of "declared" stacks, then retrieves the "running" stacks - stacks with corresponding tags from CFN API - and deleting "running" that are not "declared".

However, this tool needs to be run manually by application owners. Also, I believe many people must be having the same issue. Would it make sense to resolve this within CDK?

The AppRegistry idea looks interesting. Potentially, simpler approach with tagging and UUIDs would work too:

Then, a new CLI can be provided. I can see many options to implement this, for example cdk apply that would deploy new/changed stacks and delete stacks that are no longer "declared". Or simply cdk deploy --clear-orphans. or cdk destroy --orphans-only...

@eladb what do you think? If there's an interest, I would be happy to look into this

zhelyan commented 1 year ago

The only way I could think this can be solved is by keeping some inventory in the environment of all CDK stacks

An App is a container for one or more stacks: it serves as each stack's scope

Can we track the App state in a dedicated "app" stack?

Edit: Sorry, I am not being clear. Make App deploy an "app" stack which then uses e.g. https://github.com/aws/aws-cdk/issues/13676#issuecomment-809674456 to track the state of the stacks included in the app. In future, the dedicated app stack can also wrap any app management features e.g. creating a resource group for the app, running periodic drift detection on the app stacks etc.

aldex32 commented 1 year ago

Unfortunately, after 2 years this issue is still opened. As far as I can understand, the App is a construct that wraps a set of other constructs, for instance the Stack. So I would say, it is the responsibility of the construct to know what resources to provision and what to delete. What about having something like a beforeDeploy hook to list the current deployed stacks and an afterDeploy to delete the orphaned ones?

zhelyan commented 1 year ago

I don't think this would work. The problem is that unlike e.g terraform the cdk does not maintain state. I guess it could use a "metadata" stack for that i.e. a state stack that isn't directly editable by end users.

danielsharvey commented 8 months ago

Just adding my voice. This feature remains of significant value to CDK users, as explained above.

Does AWS have a best practice to recommend as an alternative?

For my 2c, I could see an CDK construct specifically for the purpose of deleting stacks:

// previous definition, now removed by developer
// const networkingStack = new NetworkingStack(app, 'networking-stack', props);

// replace with
const networkingStack = new DeleteThisStack(app, 'networking-stack', props);

This construct would delete the stack if it exists and issue a warning otherwise.

gshpychka commented 8 months ago

Does AWS have a best practice to recommend as an alternative?

Probably nested stacks

Kirizan commented 6 months ago

Is there a reason the pipeline construct can't just have a variable that tells the mutating step to replace Execute a change set to delete a stack and then rerun the pipeline? Just like it would if someone added or removed a stage?

Yeah, it would take more time to destroy the package, but then it would actually do what cdk destroy is supposed to do, destroy the app.

gshpychka commented 6 months ago

Is there a reason the pipeline construct can't just have a variable that tells the mutating step to replace Execute a change set to delete a stack and then rerun the pipeline? Just like it would if someone added or removed a stage?

Yeah, it would take more time to destroy the package, but then it would actually do what cdk destroy is supposed to do, destroy the app.

It doesn't know which stack to delete - CDK does not keep state. What do you mean by "just like it would if someone [..] removed a stage"? Because the pipeline would not destroy the stack(s) in that stage for the same reason.

Kirizan commented 6 months ago

It doesn't need to keep a state, it can use the existing code pipeline API to see the current state. It should also be able to use cdk diff to see if there is going to be a change to the pipeline.

Once the mutate step knows there is now going to be a change in the pipeline, rather then just removing that stage, it could first change that stage to delete a stack and run that stage. When that has all been completed, the mutate step would then remove the stage like it would normally.

github-actions[bot] commented 3 months ago

This issue has received a significant amount of attention so we are automatically upgrading its priority. A member of the community will see the re-prioritization and provide an update on the issue.

ricky-lim commented 2 weeks ago

any update on this?