aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.57k stars 3.87k forks source link

Support CloudFormation rollback triggers #5170

Open nakedible opened 4 years ago

nakedible commented 4 years ago

CloudFormation supports specifying 0-5 rollback triggers as CloudWatch Metric Alarms, which when triggered will automatically cause the stack update to be cancelled. Also a monitoring time of 0-180 minutes can be specified, which means a pause time CloudFormation will wait for any of the alarms to be triggered, or a rollback to be manually triggered, before cleaning up any resources.

There should be a way to use these with AWS CDK.

Use Case

Rollback triggers have obvious uses to make stack updates more reliable.

Proposed Solution

Similar to --notification-arns currently in deploy command, add --rollback-trigger-alarm-arns option to be able to list 1-5 CloudWatch Alarms that automatically trigger a rollback. Also add --monitoring-time-minutes option which can be used to add 0-180 minutes of pause time after a stack update before the cleanup phase starts. Both options can be specified independently, as they are useful on their own.

Other

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-rollback-triggers.html https://docs.aws.amazon.com/AWSCloudFormation/latest/APIReference/API_RollbackConfiguration.html


This is a :rocket: Feature Request

rix0rrr commented 4 years ago

Feels to me that it makes sense to encode the rollback triggers in the modeled app, rather than requiring to pass them on the command-line on every invocation, as every deployment would probably require the same triggers.

Thoughts?

eladb commented 4 years ago

@rix0rrr absolutely agree but the challenge is going to be how we propagate these instructions to things like CI/CD

rix0rrr commented 4 years ago

It seems CodePipeline hasn't implemented that particular parameter yet in their deploy action (yet).

So the question is, do we model as CLI parameters so that no person would expect the behavior would extend to CI/CD deployments (at the risk of a worse user experience), or do we model in the application to improve the user experience and note the rollbacks won't be supported for CI/CD (risking getting bug-ticketed by people not reading the disclaimer)?

Not an easy choice :/

nakedible commented 4 years ago

I think there are places where it makes sense to encode this in the application, and places where it makes sense to have it externally specified. Just like notification ARNs, there are also many places where those should be encoded in the app - but they are currently a command line option. Notification ARNs are also not supported by CodePipeline.

Personally, I'd suggest adding this as a command line option so people can use them in their own tooling (as we would) - and worry about CI/CD later on. Who knows maybe CodePipeline will end up adding these missing features at some point...

jusiskin commented 4 years ago

These are per-deployment parameters as they are passed as arguments of the CreateStack/UpdateStack/CreateChangeSet CloudFormation APIs.

That being said, it doesn't limit the approach CDK takes for exposing that functionality to CDK users. Does it make sense to use a search order for resolving the parameters:

  1. Use command-line argument from cdk deploy
  2. Use environment variables
  3. Use CDK stack configuration
  4. Use CDK app configuration
mattsains commented 4 years ago

Hello! I am interested in helping move this feature request forward, but I'm not too familiar with the CDK's process on this.

I think what @jusiskin suggested makes sense.

Since this is (in my opinion) a pretty important feature for safe CDK deployments, maybe we could target a "minimum viable product" of adding a command line argument that takes in a list of up to five cloudwatch alarm ARNs, as well as a parameter for the length of bake time after a deployment, something like this:

cdk deploy --rollback-triggers arn:aws:cloudwatch:us-west-2:123456:alarm:MyAlarm1,arn:aws:cloudwatch:us-west-2:123456:alarm:MyAlarm2 --monitoringTime 10

It would also be nice if you could specify an alarm within the stack by its logical ID, but I'm not sure about that.

I am interested in whether others think this is a good enough experience to move forward with a PR.

rix0rrr commented 4 years ago

Here's a question: do these alarms already have to exist before you start the deployment?

If so, how do you use them on your first deployment? Or do you generally only use them for updates?

mattsains commented 4 years ago

For my particular use case, our alarms would be defined in the CDK template itself, so I think we have to settle for updates only. Adding new alarms would probably be quite annoying because you'd have to update the script running cdk deploy after you deployed them

jusiskin commented 4 years ago

Taken from the CloudFormation docs:

AWS CloudFormation monitors the specified alarms during the stack create or update operation, and for the specified amount of time after all resources have been deployed.

and

If a given Cloudwatch alarm is missing, the entire stack operation fails and is rolled back.

Taken together, these statements seem to indicate that it is not possible to use rollback triggers that target alarms being provisioned in the CloudFormation template when making a CreateStack request. I also wonder what the behavior would be when updating a CloudWatch Alarm that is specified as a rollback trigger.

I had wondered if CDK could be clever and make a UpdateStack request or CreateChangeSet and ExecuteChangeSet API requests after the CreateStack request to add the triggers and observe the monitoring period. The problem with this is that the rollback would act on the UpdateStack/ExecuteChangeSet and not on the CreateStack request.

Since these are limitations of CloudWatch → CloudFormation, it might be best to surface this limitation in the CDK documentation. Would it be possible for CDK to work-around this limitation during deployment and omit rollback triggers that are being defined in the CreateStack request template? Perhaps it could display a warning or present a confirmation prompt to the user.

EDIT: Fixed typo

mattsains commented 4 years ago

That makes sense. Which approach do you recommend?

jusiskin commented 4 years ago

I don't see any harm in starting with command-line arguments and later iterate on that. It might make sense to defer embedding this in the CDK app for the time-being anyway as per @rix0rrr's comment:

It seems CodePipeline hasn't implemented that particular parameter yet in their deploy action (yet).

So the question is, do we model as CLI parameters so that no person would expect the behavior would extend to CI/CD deployments (at the risk of a worse user experience), or do we model in the application to improve the user experience and note the rollbacks won't be supported for CI/CD (risking getting bug-ticketed by people not reading the disclaimer)?

Not an easy choice :/

rix0rrr commented 4 years ago

I'm thinking something like this:

interface RollbackConfigurationOptions {
    monitoringPeriod?: Duration;

    rollbackTriggers?: RollbackTrigger[];
}

interface StackProps extends RollbackConfigurationOptions {
    // ...
}

class Stack {
    public configureRollback(options: RollbackConfigurationOptions) {
    }
}

// Usage

stack.configureRollback({ ... });
// -or-

new Stack(..., {
    monitoringPeriod: Duration.minutes(60),

    rollbackTriggers: [
        RollbackTrigger.fromAlarmArn('...', {
            monitorDuring: [StackLifecycle.UPDATE]
        })
    ]
});

RollbackTrigger.fromAlarmArn() needs to check that the ARN it gets is fully resolved (we need that to have the CLI pass it into ExecuteChangeSet properly). We will be adding future extensions to this that may support alarms created inside the Stack itself and have the CLI do some limited resolution and/or token substitution.

Needs some judicious use of defaults to make it agreeable to use, and we need to decide whether to honor the monitoringPeriod if none of the triggers apply during our current phase (notably: CREATE will be a typical one). I vote no, and if we deem it necessary we add a boolean to allow it.

IMPLEMENTATION NOTES

bcelenza commented 3 years ago

Reviving this conversation.

I definitely have need for this capability in my CDK stacks, and would be happy to help with the contribution. @rix0rrr's approach above would work for my use case.

AlexCheema commented 3 years ago

@rix0rrr

+1 would really like this feature in CDK. it makes deployments much safer. currently we're applying the following patch to CDK as a hacky workaround:

note: this involves managing the stack lifecycle outside of CDK which is not ideal

diff --git a/node_modules/aws-cdk/lib/api/deploy-stack.js b/node_modules/aws-cdk/lib/api/deploy-stack.js
index 60384a1..8e245a2 100644
--- a/node_modules/aws-cdk/lib/api/deploy-stack.js
+++ b/node_modules/aws-cdk/lib/api/deploy-stack.js
@@ -25,6 +25,7 @@ if (!regionUtil.getEndpointSuffix) {
 }
 const LARGE_TEMPLATE_SIZE_KB = 50;
 /** @experimental */
+
 async function deployStack(options) {
     var _a, _b;
     const stackArtifact = options.stack;
@@ -78,6 +79,17 @@ async function deployStack(options) {
     }
     const update = cloudFormationStack.exists && cloudFormationStack.stackStatus.name !== 'REVIEW_IN_PROGRESS';
     logging_1.debug(`Attempting to create ChangeSet with name ${changeSetName} to ${update ? 'update' : 'create'} stack ${deployName}`);
+    let rollbackConfiguration = undefined;
+    if(process.env.ROLLBACK_CONFIGURATION !== undefined) {
+        logging_1.print('%s: !!! process.env.ROLLBACK_CONFIGURATION defined: ' + process.env.ROLLBACK_CONFIGURATION, colors.bold(deployName));
+        rollbackConfiguration = JSON.parse(process.env.ROLLBACK_CONFIGURATION);
+    }
+    let includeNestedStacks = false;
+    if (process.env.CHANGESET_INCLUDE_NESTED !== undefined) {
+        logging_1.print('%s: !!! process.env.CHANGESET_INCLUDE_NESTED defined: ' + process.env.CHANGESET_INCLUDE_NESTED, colors.bold(deployName));
+        logging_1.print('%s: !!! setting IncludeNestedStacks to ' + (process.env.CHANGESET_INCLUDE_NESTED === "true"), colors.bold(deployName));
+        includeNestedStacks = process.env.CHANGESET_INCLUDE_NESTED === "true"
+    }
     logging_1.print('%s: creating CloudFormation changeset...', colors.bold(deployName));
     const changeSet = await cfn.createChangeSet({
         StackName: deployName,
@@ -91,6 +103,8 @@ async function deployStack(options) {
         NotificationARNs: options.notificationArns,
         Capabilities: ['CAPABILITY_IAM', 'CAPABILITY_NAMED_IAM', 'CAPABILITY_AUTO_EXPAND'],
         Tags: options.tags,
+        IncludeNestedStacks: includeNestedStacks,
+        RollbackConfiguration: rollbackConfiguration
     }).promise();
     logging_1.debug('Initiated creation of changeset: %s; waiting for it to finish creating...', changeSet.Id);
     const changeSetDescription = await cloudformation_1.waitForChangeSet(cfn, deployName, changeSetName);
danielwhite-aws commented 3 years ago

i like @rix0rrr's idea. It also would be interesting if you could configure it from a CloudFormationCreateUpdateStackAction

openwebsolns commented 3 years ago

Echoing support for this feature.

stevemer commented 2 years ago

+1 for this feature... it's hard to justify using CDK pipelines if I can't be guaranteed a concrete rollback story.

myashchenko commented 2 years ago

+1

Rollback support is a critical feature for many projects.

byF commented 2 years ago

Any plans to put this on the roadmap?

teroxik commented 2 years ago

+1 please

rix0rrr commented 2 years ago

Unfortunately for CDK Pipelines, CodePipeline does not support passing RollbackConfiguration[1] to CloudFormation using the CloudFormation Action[2][3].

[1] https://docs.aws.amazon.com/AWSCloudFormation/latest/APIReference/API_RollbackConfiguration.html [2] https://docs.aws.amazon.com/codepipeline/latest/userguide/action-reference-CloudFormation.html#action-reference-CloudFormation-config [3] https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/continuous-delivery-codepipeline-cfn-artifacts.html

DanielBauman88 commented 2 years ago

+1

Amazon is clear about the importance of Metrics monitoring and auto-rollback. It is a problem that the primary and recommended tool (CDK) doesn't let customers integrate with this functionality which cloudformation supports.

I don't think it makes sense to block all the customers using CDK and Cloudformation because of lack of support in Code Pipelines.

jabalsad commented 2 years ago

+1

Amazon is clear about the importance of Metrics monitoring and auto-rollback. It is a problem that the primary and recommended tool (CDK) doesn't let customers integrate with this functionality which cloudformation supports.

I don't think it makes sense to block all the customers using CDK and Cloudformation because of lack of support in Code Pipelines.

I'd like to echo this sentiment. Two ways in which I think CDK could support rollbacks without the support of CodePipeline is:

  1. By having rollbackAlarmArn and monitoringPeriod fields on a cdk.Stack that gets forwarded to CloudFormation create-change-set and execute-change-set operations during cdk deploy.
  2. Supporting --rollback-alarm-arn and --monitoring-period command-line arguments when running cdk deploy.

At the time when CodePipeline finally supports this integration, these features could be seamlessly integrated with it too.

This unblocks users today who are not using CodePipeline because of this limitation.

luiabrah commented 1 year ago

Hello, this feature is very necessary for our team to implement full CI/CD with AWS CodePipelines. Is there any update on this issue?

bendu commented 1 year ago

This is what I have done to get around limitations with setting termination protection, stack policy and rollback alarms in CDK. One caveat is this requires creating a second stack that depends on the first stack using the addDependency function.

In your second stack, you'll need to copy the code for and instantiate a StackConstruct that I've made which triggers a lambda function using a custom CloudFormation resource.

nakedible-p commented 1 year ago

Hi all! Back with this stuff again – there is still no light at the end of the tunnel.

To recap my current status:

So, the best solution for me would be that I can somehow specify rollback triggers from inside the stack, and they would be in effect for the next stack update.

I have figured out three possible ways of doing this:

None seem great options, but all will probably work for my use case.

All in all, I must say that "consumers" shouldn't have to come up with ideas to build workarounds for an obvious missing feature that's been there from 2019 and still is there in 2023. image

openwebsolns commented 1 year ago

For what it's worth, my team has gotten around this issue and it hasn't been a problem for over two years now. Like @nakedible-p, we have a CodePipeline that deploys to multiple stacks across multiple accounts and regions; and the rollback triggers are defined in the stack themselves (and therefore must be added after the fact).

For us, the key is that the rollback configuration doesn't actually change because we define a single composite CloudWatch alarm for rollback. This lets us "set it and forget it". We perform a one-time action of updating the stack with its rollback configuration. The equivalent CLI command would be:

aws cloudformation update-stack --stack-name $stack --use-previous-template --rollback-configuration $config

Because of the --use-previous-template option, this command can be executed at any point in time after that initial stack launch without any acrobatics to get the template itself. We already have a script for bootstrapping all those accounts and regions (as well as setting up the accounts themselves), so it's not much to add this trivial one as well. If we need to change the alarms associated with the rollback, we just modify the composite alarm instead via CDK. No need to re-run the command, ever. If you wanted to, you could probably set up AWS Config (https://aws.amazon.com/config/) rule to warn you if you haven't performed this one-time step.

It may not be what we set out to do, but it accomplishes a task that, like others, are outside the scope of a stack update, even though the CloudFormation API confusingly hints that it is. If this ever gets supported in some way by CodePipeline, there's not much effort misspent in no longer running the one-liner. There's no infrastructure to set up (separate Lambda function), or mucking around with the pipeline structure (replacing the CREATE_CHANGE_SET). After all, it's just a one-time change.

YMMV.

nakedible-p commented 1 year ago

Thank you for the suggestion. We are also using composite alarms, and obviously could even name the alarm in a consistent way based on the stack name. This gives me more confidence that simply setting it once is probably good enough. However, I do want to automate/script this somehow it doesn't require manual actions - but I think I will try to stick with the simplest and most obvious option.