cloudcomponents / cdk-constructs

A collection of higher-level reusable cdk constructs
MIT License
625 stars 104 forks source link

Updates to DummyTaskDefinition break the stack and prevent future updates #140

Closed cuperman closed 2 years ago

cuperman commented 2 years ago

If you try to update any attribute on the DummyTaskDefinition resource (for example, the optional family prop), then it fails with this error:

5:14:45 AM | UPDATE_FAILED        | Custom::BlueGreenService                  | EcsServiceCustomResource5EDA8C27
CustomResource attribute error: Vendor response doesn't contain taskDefinition.taskDefinitionArn key in object arn:aws:cloudformation:us-east-1:768404655438:stack/EcsTaggingTest/91a55fa0-4b94-11ec-a731-0eacf7b0a8db|DummyTaskDefinitionE3D9D432|fcad085b-b5a2-4a25-bbad-a66a696a7803 in S3 bucke
t cloudformation-custom-resource-storage-useast1

The DummyTaskDefinition isn't really intended to be used / updated, so it looks like updates weren't intended to be support. However, when this happens, it triggers a rollback causing CloudFormation to call update on the custom resource again, leaving it in this broken state. After this happens, subsequent deployments to the stack fail, and you can't really recover without updating the custom resource.

This can happen unintentionally during a refactor where the node address changes, because the node address is the default value for the optional family property.

cuperman commented 2 years ago

I have a potential fix for this issue, but I'm still testing a few scenarios before it's ready: https://github.com/cloudcomponents/cdk-constructs/pull/139

cuperman commented 2 years ago

So, the fix is to also call registerTaskDefinition on the update hook. This creates a new task definition if any of the properties change, returning a new Physical ID. The new ID tells CloudFormation that the resource has been replaced, so it cleans up the old one by calling the onDelete hook with the old properties.

The update failure is avoided because the update returns resource properties, allowing the vendor to resolve taskDefinition.taskDefinitionArn getattr call.

I've tested multiple scenarios, and I cannot find a case where updating the task definition negatively affects the service with the code deployed container/task.