aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.61k stars 3.91k forks source link

core: add RemovalPolicyOptions parameter to L2 applyRemovalPolicy method #22077

Open saltman424 opened 2 years ago

saltman424 commented 2 years ago

Describe the feature

The applyRemovalPolicy method signature for L1 resources is: CfnResource.applyRemovalPolicy(policy: cdk.RemovalPolicy, options?: cdk.RemovalPolicyOptions): void

For L2 resources it is: Resource.applyRemovalPolicy(policy: cdk.RemovalPolicy): void

It would be nice to be able to pass the options in for L2 resources, and since the L2 method is just a wrapper for the L1 method, it should be trivial to do so.

Use Case

This is mainly a limitation when I need to set the applyToUpdateReplacePolicy option to false, which is actually quite often

Proposed Solution

Just add the options parameter here: https://github.com/aws/aws-cdk/blob/2e797b5a37c69561b42cbe07fe2144af41833e00/packages/%40aws-cdk/core/lib/resource.ts#L231

And pass it in here: https://github.com/aws/aws-cdk/blob/2e797b5a37c69561b42cbe07fe2144af41833e00/packages/%40aws-cdk/core/lib/resource.ts#L236

Other Information

Great first PR

Acknowledgements

CDK version used

2.41.0

Environment details (OS name and version, etc.)

Windows 10 Enterprise Version 10.0.19044

peterwoodworth commented 2 years ago

Being able to apply the update removal policy easily is something we should support - and it seems we have an easy way to do that by exposing options to the wrapper. I don't see any harm in exposing the options, except maybe the default option may need a little tweaking in the wrapper on my first glance.

I am marking this issue as p2, which means that we are unable to work on this immediately.

We use +1s to help prioritize our work, and are happy to revaluate this issue based on community feedback. You can reach out to the cdk.dev community on Slack to solicit support for reprioritization.

Check out our contributing guide if you're interested in contributing yourself - there's a low chance the team will be able to address this soon but we'd be happy to review a PR 🙂

akshatakulkarni25 commented 1 year ago

Hello, can i work on this issue??

peterwoodworth commented 1 year ago

@akshatakulkarni25 go ahead, open for anyone 🙂

rix0rrr commented 1 year ago

Before we merge this I would like to know about the use case.

What is the situation where you are fine with your resource and data being lost in the course of a replacement, but not in the course of a stack delete?

saltman424 commented 1 year ago

@rix0rrr the main use case in my mind is if your concern is just uptime. Particularly, with the new cdk import feature, a workflow that I find quite useful is allowing resources to be replaced when necessary, but when I need to teardown the stack, I leave some resources running and then when I redeploy the stack, I import those resources back in. Also, I think it is worth keeping in mind that this is for all possible resources. Replacing a Lambda function or SSM parameter is very different than replacing a DynamoDB table.

With that being said, I would also think that since it is trivial to have L2 resources support this feature and it would be hidden in an options parameter, there is really no harm in exposing this option, even if it was rarely or never a good idea. If a developer is choosing to specify this option, they have chosen to do so for one reason or another. They could equally choose to do so today by using L1 resources directly or through .node.defaultChild. Exposing it to L2 resources would just make their code less ugly and allow them to stay at the L2 level.

rix0rrr commented 1 year ago

there is really no harm in exposing this option, even if it was rarely or never a good idea.

It's easy to think that features are free, but this is rarely ever the case.

This is not to say we definitely won't do it, but arguments like "it's free" and "it's already possible one way so why not also make it possible a different way" are not necessarily convincing.

In fact, the argument "it's already possible" argues against adding the feature. I don't really see a downside to keeping the simple interface for the 99%, and have a more complicated interface for the 1% who really know what they are doing. "Make simple things simple, and make complex things possible", and all that.

but when I need to teardown the stack, I leave some resources running and then when I redeploy the stack, I import those resources back in.

Thanks, that's helpful. Sounds like when you tear down a stack, you don't actually intend to get rid of it, but do some different operation. Maybe you're refactoring between stacks?

Is there a better, more holistic workflow support trying to get out here?

saltman424 commented 1 year ago

@rix0rrr I hear your points in general. I still think making the L2 decorator interface more consistent with the underlying L1 interface would have some inherent value and have limited testing and maintenance overhead. However, maybe that is not the case. As for the other points, I may have misread how you guys designed things, but in CDK, I view options in these types of methods as effectively power-user features. So if someone decides to explore the options, they are choosing to take on the cognitive overhead and inherent risk of doing something wrong.

"Make simple things simple, and make complex things possible", and all that.

To me, it seems like CloudFormation designed this feature in a way that is simple and straightforward, so I would think for many users of CloudFormation or CloudFormation-backed tools, this is thought of as a simple feature. I figured that is why CDK implemented it as a simple but optional feature for L1 resources (after all, CDK could have left it off of L1 resources, forcing developers to use cfnOptions and DeletionPolicy directly if they want to set UpdateReplacePolicy independently).

I don't really see a downside to keeping the simple interface for the 99%, and have a more complicated interface for the 1% who really know what they are doing.

Either way we would be keeping the simple interface for many users as the second parameter is optional. And passing in options is already a more complicated interface than not passing in options. Having to create your own L2 decorator function is just a lot more complicated than that and would likely be less robust for most users. I would also argue that people knowing enough about CloudFormation to want to use this feature is not the same as really knowing what they are doing with CDK/TypeScript (or whatever language they are using). There are plenty of people who know and understand this feature in CloudFormation (or were led to this feature based on reading documentation or help guides) but are not adept at using L1 resources or escape hatches and/or don't know how to go through the CDK source code to figure out how to implement this feature themselves.

Sounds like when you tear down a stack, you don't actually intend to get rid of it, but do some different operation. Maybe you're refactoring between stacks?

For my use case, that is mostly right. There are lots of reasons for this:

  1. As a workaround to the very common and very annoying issue: Export EXPORT_NAME cannot be updated as it is in use by STACK_NAME
  2. Fixing broken AWS resources. It is not uncommon for AWS resources to get into weird broken states, sometimes from user error and sometimes from service error. A somewhat simple fix is to teardown and redeploy.
  3. Renaming stacks or resources
  4. Reorganizing resources between stacks. We have hundreds of resources regularly getting updates, sometimes it makes sense to move things into different stacks or nested stacks, which can require redeployment.
  5. Cost savings. In some development accounts, we don't need everything running all the time. It can be helpful to tear things down for some time and then redeploy when needed. Most of the time, it is fine if everything gets deleted here. But sometimes, it is nice to keep some data storage.

It is also worth noting, that setting UpdateReplacePolicy to RETAIN can in some cases cause issues in stacks because the replacement may conflict with the original resource or a previous version of the resource. So if a user wants to set the DeletionPolicy to RETAIN but does not want the stack to break in the future, they may want to set the UpdateReplacePolicy to SNAPSHOT or DELETE.

I will also say, another way of looking at this is: sometimes you need to always have at least one instance of a resource but not more than that. For example, a SecretsManager secret that contains RDS credentials. You don't want to lose the credentials, but if a new secret has been created that contains the credentials and everything is now using that, you are free to delete the old secret.

Is there a better, more holistic workflow support trying to get out here?

I have created some holistic workflows to make it easier for my team to do these stack refactors. However, I would argue that is outside the scope of this simple feature. This feature and cdk import can both be helpful building blocks to these workflows. But the workflows themselves are complicated enough and potentially opinionated enough that it is probably not worth bundling altogether, and may not even be worth having in CDK at all. However, I am happy to share more details on these workflows if you think otherwise.

filletofish commented 1 year ago

Hi @rix0rrr and @saltman424,

Have there been any updates since? I personally agree with @saltman424 statements. Just checking because the PR is open for more than 3 weeks - https://github.com/aws/aws-cdk/pull/23530 and it will be soon closed.

saltman424 commented 1 year ago

@rix0rrr any thoughts? I will say, when I originally opened this issue, I assumed that the L2 method and the L1 method were designed with similar intentions and the omission of these options on the L2 method was just accidental. From your feedback, it sounds like the omission may have been an intentional design decision. Personally, I would advocate for making a consistent design decision across both methods, and I would prefer if that decision was to support the UpdateReplacePolicy feature from CloudFormation, but I understand if you decide otherwise on either of those points.

With that being said, @filletofish it seems like that it is only closing soon because it is requesting changes from you due to linting issues. Specifically:

❌ Features must contain a change to an integration test file and the resulting snapshot.

Although, as far as I can tell, you have added the appropriate tests. @rix0rrr do you know why the linting is failing?

filletofish commented 1 year ago

Although, as far as I can tell, you have added the appropriate tests.

I haven't actually added integration tests. I found out that for resource.ts integ tests for existing features integ tests didn't exist and I didn't have time to setup new integ tests. But if it's required, I can work on those.

saltman424 commented 1 year ago

Got it. I was looking at the unit tests. @rix0rrr maybe you can advise on whether to add integration tests / whether this feature is going to be approved or not?

filletofish commented 1 year ago

Hi @saltman424 and @rix0rrr, any updates on this issue / PR?

saltman424 commented 1 year ago

@rix0rrr any feedback?

filletofish commented 1 year ago

Hi @rix0rrr @saltman424 , just wanted to check with you if there is still a demand on this issue?

saltman424 commented 1 year ago

@filletofish I would still like to see the issue resolved. I am just not sure if we've addressed @rix0rrr's concerns. Although, I am guessing he's been busy with other stuff. @rix0rrr if you're ever able to take a look at this issue, I would still love to get it resolved. Let me know if you need any more information in order to do so