aws / copilot-cli

The AWS Copilot CLI is a tool for developers to build, release and operate production ready containerized applications on AWS App Runner or Amazon ECS on AWS Fargate.
https://aws.github.io/copilot-cli/
Apache License 2.0
3.49k stars 404 forks source link

[Bug]: 'AddonsStack' missing from environment stack resources. #5791

Open cristobalmackenzie opened 5 months ago

cristobalmackenzie commented 5 months ago

Description:

I'm trying to use env deploy -n test to update an environment. I initially wanted just to test there are no changes to the deployed environment and expected the common "there are no changes to deploy, use --force, etc" message.

What happened instead was that copilot tried to recreate all my addons, it failed because there are already exports with those names, and then it rolled back.

Details:

I'm running the latest version of copilot (1.33.3), had 1.33.2 before. Same behaviour in both. I believe the last deploy was done with 1.32.0.

This is the output I have from running copilot env deploy -n test:

cristobal:bci_wholesale/ (db-proxy) $ copilot env deploy --name test                                                                                                                                                                                                 [11:54:57]
✔ Proposing infrastructure changes for the bci-wholesale-test environment.
- Creating the infrastructure for the bci-wholesale-test environment.                     [update rollback complete]  [47.0s]
  The following resource(s) failed to create: [AddonsStack]. The followi
  ng resource(s) failed to update: [EnvironmentManagerRole].
  - A CloudFormation nested stack for your additional AWS resources                       [rollback complete]         [10.1s]
    Export with name bci-wholesale-test-backendstorageBucketARN is already
     exported by stack bci-wholesale-test-AddonsStack-1KK7OUJPRFOGF. Rollb
    ack requested by user.
    - A bucket policy to deny unencrypted access to the bucket and its contents           [not started]
    - An Amazon S3 bucket, backend-storage, for storing and retrieving objects            [not started]
    - A security group for your RDS database bciWholesaleDatabase                         [not started]
    - A DB parameter group for engine configuration values                                [not started]
    - The bciWholesaleDatabase RDS database instance                                      [not started]
    - A Secrets Manager secret to store your DB credentials                               [not started]
    - A security group for your workload to access the RDS database bciWholesaleDatabase  [not started]
  - An IAM Role to describe resources in your environment                                 [update complete]           [15.9s]
    Resource update cancelled

✘ deploy environment test: stack bci-wholesale-test did not complete successfully and exited with status UPDATE_ROLLBACK_COMPLETE

This is the output I have from running copilot env package --diff -n test:

cristobal:bci_wholesale/ (db-proxy) $ copilot env package --diff -n test                                                                                                                                                                                             [11:56:24]
~ Metadata:
    ~ Version: v1.32.0 -> v1.33.3
~ Resources:
    + AddonsStack:
    +     Metadata:
    +         'aws:copilot:description': 'A CloudFormation nested stack for your additional AWS resources'
    +     Type: AWS::CloudFormation::Stack
    +     Properties:
    +         Parameters:
    +             App: !Ref AppName
    +             Env: !Ref EnvironmentName
    +         TemplateURL: https://stackset-bci-wholesale-i-pipelinebuiltartifactbuc-1hs81cn6bfabo.s3.us-east-1.amazonaws.com/manual/addons/environments/0811929c1e01196f509e20c167268fb2095512e02392d83183fa29576d0a1360.yml
    ~ CertificateValidationFunction/Properties:
        ~ Code:
            ~ S3Key: manual/scripts/custom-resources/certificatevalidationfunction/55a77b5e5853c885c22d67399e0398df98de4754aca2da294cb8a92eb57d8519.zip -> manual/scripts/custom-resources/certificatevalidationfunction/d386997f9939549cab9fcebd1e590fa8e93dab62dbc4cfdab92f299b3a7b9ecd.zip
        ~ Runtime: nodejs16.x -> nodejs20.x
    ~ CustomDomainAction/Properties:
        - EnvHostedZoneId: !Ref EnvironmentHostedZone
    ~ CustomDomainFunction/Properties:
        ~ Code:
            ~ S3Key: manual/scripts/custom-resources/customdomainfunction/8918a141c6f0bc0721421ee00c6ab19eda8dc3419ba29dc7effabd53341daa8e.zip -> manual/scripts/custom-resources/customdomainfunction/10a594c22cd00bbdb7bfe83d78aa929a9bf99ff87ac2618ada9581d5162c1438.zip
        ~ Runtime: nodejs16.x -> nodejs20.x
    ~ DNSDelegationFunction/Properties:
        ~ Code:
            ~ S3Key: manual/scripts/custom-resources/dnsdelegationfunction/3b5699cd0fa653b49382b9901501b89cf316aff779766a22282aa8070fd873ad.zip -> manual/scripts/custom-resources/dnsdelegationfunction/0fce63197f0b8cf5e3bd944358aa1ef736f4d7d383fb9db582ca312466b0e6eb.zip
        ~ Runtime: nodejs16.x -> nodejs20.x
    ~ DelegateDNSAction/Properties:
        - EnvHostedZoneId: !Ref EnvironmentHostedZone
    ~ EnvironmentManagerRole/Properties/Policies:
        ~ - (changed item)
          ~ PolicyDocument/Statement:
              (22 unchanged items)
              + - Sid: ListStacks
              +   Effect: Allow
              +   Action:
              +     - 'cloudformation:ListStacks'
              +   Resource: "*"
              (1 unchanged item)

Observed result:

The --debug flag mentioned in the issue template doesn't seem to work.

cristobal:bci_wholesale/ (db-proxy) $ copilot env deploy --name test --debug                                                                                                                                                                                         [11:49:17]
✘ unknown flag: --debug

Expected result:

I expected the common "there are no changes to deploy" message. I was running that before actually adding more addons.

Debugging:

Looking at the resources for my environment stack, I can see that there isn't a AddonsStack resource for my addons. We can also see that in the package diff.

cristobalmackenzie commented 5 months ago

Of course I also realize that we might've broken the environment, but I have no idea how. The environment addons stack is still there, all the addons resources (bucket, RDS Instance, etc) are still there, working fine.

dannyrandall commented 5 months ago

Hey @cristobalmackenzie! To clarify, did you use Copilot to deploy the addons stack initially? (with Copilot v1.32.0, the previous version the environment was deployed with)? It's interesting to me that the AddonsStack resource isn't present in the deployed template 🤔

cristobalmackenzie commented 5 months ago

Hey @dannyrandall ! Yes indeed, we used copilot for everything. I've been looking into it, learning more about CloudFormation and I can't figure out how to replicate this. As far as I understand one shouldn't be able to end up in this state.

I did a couple of env deploy / delete with and without the addons folder (I see a couple of delete, delete failed for the addons stack) back in december, and hadn't updated since. It must've been at that time that the stack ended up that way but I can't figure out how.

dannyrandall commented 5 months ago

Ah ok! I'm wondering if what happened is: the Addons stack failed to delete, but the main stack considered the update a success and therefore removed the AddonsStack resource from the template, which is why it's getting redeployed now.

Is the existing Addons stack in some kind of DELETE_FAILED similar state? If so, could you delete that stack directly via CloudFormation and then try redeploying your environment stack?

cristobalmackenzie commented 5 months ago

The existing Addons stack actually appears intact, it consists of a bucket and an RDS instance. There were three attempts at deletion that failed because some exports where in use by other stacks. The addons stack state is UPDATE_COMPLETE.

The environment stack though, shows a DELETE_FAILED state for the Addons resource in the Events tab. Subsequent updates to the Addons resource start with CREATE_FAILED, which is when I found this issue.

I could manually delete the stack and recreate the whole thing, but I'd like to try and not delete it since its being used daily. If there are no other possible workarounds though, I might not have a choice.

I tried to use the Import resource feature of CloudFormation to reincorporate the Addons stack into the environment template but couldn't do it because the template has too many resources that are not supported in the import feature.