support best-effort stack deployment

rittneje commented 1 year ago

Describe the feature

Reopening #22516. The rationale provided by @indrora is irrelevant. It is already the case that CDK considers each stack to be independent - if one fails it does not make any attempt to roll back any that were already deployed.

Currently if a stack fails during cdk deploy then that stack is reverted and any remaining ones are aborted. Instead, there should be a flag to do a best-effort deployment, meaning that if a stack fails:

that stack gets rolled back
other stacks that depend on it (directly or indirectly) are skipped
all remaining stacks continue to deploy normally
the cdk deploy command itself still fails at the end

Use Case

Sometimes cdk deploy can fail for arbitrary reasons, like hitting an account limit, or getting throttled. In these situations it would be better to be able to allow the deployment to proceed as much as possible.

Proposed Solution

No response

Other Information

No response

Acknowledgements

[ ] I may be able to implement this feature request
[ ] This feature might incur a breaking change

CDK version used

2.39.1

Environment details (OS name and version, etc.)

Alpine 3.16, Python 3.10.6

indrora commented 1 year ago

I'm not sure I got all of the possible points that this affects.

This request has some wide-ranging plausible effects, since things like custom resources can't be guaranteed to be rolled back perfectly, custom resources that report a new ID, or IDs that change.

The primary issue is reconciling the changes that a stack can create cross-stack or cross-account. CloudFormation rollbacks that affect the running version of containers can cause EKS, ECS, Fargate, plausibly EC2, etc. to fail.

While stacks are theoretically independent, cross-stack resources are common and having stacks fall out of sync with one another can lead to major issues with application reliability.

rittneje commented 1 year ago

@indrora I don't see how custom resources are relevant to this feature request. This is only about continuing to deploy the remaining stacks if one fails. It has no bearing on how the failed stack rolls back.

With regards to cross-stack references, that implies a dependency from one stack to another. As I mentioned, if stack A fails, and stack B depends on A, then B will not be deployed (which is the current behavior). However, if stack C has no dependency on A or B, then it will be deployed anyway. It is already the case that CDK explicitly knows about cross-stack dependencies, as it must consider them when determining the order in which to deploy stacks.

rix0rrr commented 1 year ago

I don't mind this as a feature request, but I'm also not too keen on it. It would be Yet Another Flag, given that I don't think we can assume the majority of users would either fall into the camp "proceed as much as you can" vs "stop as soon as there's trouble." (In fact, given that after 4-5 years you're the first person to ask for this, I would say most people fall into the "stop as soon as possible" camp).

And yes, if there are stack dependencies we will of course not be able to proceed with those stacks.

P.S: Are you aware of the --no-rollback flag, and does that not solve most of your practical issues already?

rittneje commented 1 year ago

P.S: Are you aware of the --no-rollback flag, and does that not solve most of your practical issues already?

As I understand it, the --no-rollback flag tells CloudFormation not to rollback the stack that failed. It is also unclear whether cdk would proceed with the remaining stacks, but I would not expect it to. In short, --no-rollback is for a completely different use case and does not meet my needs.

aws / aws-cdk