aws / copilot-cli

The AWS Copilot CLI is a tool for developers to build, release and operate production ready containerized applications on AWS App Runner or Amazon ECS on AWS Fargate.
https://aws.github.io/copilot-cli/
Apache License 2.0
3.49k stars 404 forks source link

Recovering from an inconsistent state? #5539

Closed ssyberg closed 7 months ago

ssyberg commented 9 months ago

I have a kinda weird issue where my s3 addon somehow got out of sync with the "state" - I see the bucket so it's been created and is working, but I can't successfully run a env deploy anymore. However I can still successfully deploy services to this environment without issue. Is there someway to "re-sync" this or tell the state that the add on manifests have already been exectued?

Name: production
✔ Proposing infrastructure changes for the REDACTED-production environment.
- Creating the infrastructure for the REDACTED-production environment.                     [update rollback complete]  [49.2s]
  The following resource(s) failed to update: [AddonsStack].                                                           
  - A CloudFormation nested stack for your additional AWS resources                        [update rollback complete]  [33.8s]
    The following resource(s) failed to create: [REDACTEDproductionassetsB                                             
    ucket].                                                                                                            
    - A bucket policy to deny unencrypted access to the bucket and its contents            [not started]                
    - An Amazon S3 bucket, REDACTED-production-assets, for storing and retrieving objects  [delete complete]           [2.0s]
      REDACTED-production-assets already exists in stack arn:aws:cloudformat                                           
      ion:us-east-1:840475441050:stack/REDACTED-production-AddonsStack-1WWMD                                           
      JSZGILE1/4d9a0030-954e-11ee-bc10-12b72cf75091                                                                    

✘ deploy environment production: stack REDACTED-production did not complete successfully and exited with status UPDATE_ROLLBACK_COMPLETE
Lou1415926 commented 9 months ago

@ssyberg Is "arn:aws:cloudformat ion:us-east-1:840475441050:stack/REDACTED-production-AddonsStack-1WWMDJSZGILE1/4d9a0030-954e-11ee-bc10-12b72cf75091" the addons stack of the "production" environment stack?

My wild guess is that you or another developer probably have changed the logical ID of the s3 bucket environment addons. Seeing the logical ID change, CloudFormation thinks it needs to create a new S3 bucket. However since BucketName didn't change, CloudFormation attempts to create a new S3 bucket with the same BucketName as the existing bucket - hence name collision.

Logical ID is the MyBucket part of the snippet ⬇️

Resources:
  MyBucket:
    Type: AWS::S3::Bucket
    Properties:...
ssyberg commented 9 months ago

@ssyberg Is "arn:aws:cloudformat ion:us-east-1:840475441050:stack/REDACTED-production-AddonsStack-1WWMDJSZGILE1/4d9a0030-954e-11ee-bc10-12b72cf75091" the addons stack of the "production" environment stack?

My wild guess is that you or another developer probably have changed the logical ID of the s3 bucket environment addons. Seeing the logical ID change, CloudFormation thinks it needs to create a new S3 bucket. However since BucketName didn't change, CloudFormation attempts to create a new S3 bucket with the same BucketName as the existing bucket - hence name collision.

Logical ID is the MyBucket part of the snippet ⬇️

Resources:
  MyBucket:
    Type: AWS::S3::Bucket
    Properties:...

Thanks @Lou1415926 I assume I can get this ID and replace it somewhere to address this?

Lou1415926 commented 9 months ago

@ssyberg umm, I think I will be able to answer that question more accurately if I have a bit more information.

  1. Are you looking to create an additional S3 bucket?
  2. Are you deploying to an env stack called REDACTED-production? This is for me to confirm that arn:aws:cloudformation:us-east-1:840475441050:stack/REDACTED-production-AddonsStack-1WWMDJSZGILE1/4d9a0030-954e-11ee-bc10-12b72cf75091 is indeed the same addons stack that contains REDACTED-production-assets bucket AND to which CloudFormation wants to create REDACTED-production-assets bucket again.
  3. Try running aws s3api get-bucket-tagging --bucket REDACTED-production-assets, and look for the tag with the key named "aws:cloudformation:logical-id". Is the value the same as the logical ID that you have in copilot/environments/production/addons/<s3bucket-template-name>.yml?
ssyberg commented 9 months ago

@Lou1415926 thank you! Answers below

  1. No I'm not looking to create a new bucket, I just want copilot to recognize that the bucket exists and not throw an error so I can continue to use copilot to manage this addon
  2. I've redacted the app name for privacy reasons, but yes that follows the {APP}-{ENV} naming pattern, so the environment is production and that arn looks correct to me.
  3. Actually I do see an issue here, the logical id appears to match a different environment, staging not production. Can I retag this to fix it?
        {
            "Key": "aws:cloudformation:logical-id",
            "Value": "REDACTEDstagingassetsBucket"
        },
ssyberg commented 9 months ago

Hmm I tried to update the tag in the aws console and got this:

image
Lou1415926 commented 9 months ago

@ssyberg Yeah that is a tag fully managed by CloudFormation, so you can't change it : (

Actually I do see an issue here, the logical id appears to match a different environment, staging not production. Can I retag this to fix it?

Can you run the same command and look for the value of the tag named "aws:cloudformation:stack-name"? If it is REDACTED-production-AddonsStack-1WWMDJSZGILE1, then this means that:

  1. The bucket is managed by the REDACTED-production-AddonsStack-1WWMDJSZGILE1 stack
  2. However, the logical ID of the bucket is REDACTEDstagingassetsBucket.

You can double check the second point by going to the CloudFormation and look at the template, you would probably find something like this:

Resources:
  # ...other resources
  REDACTEDstagingassetsBucket:
    Type: AWS:S3:Bucket
    Properties:
      BucketName: REDACTED-production-assets

Let me know if this is the case!

ssyberg commented 9 months ago

@Lou1415926 yes, that is the correct stack name, so now the question is how to fix this? I actually am now having the same issue with another s3 bucket on my develop environment. I'm sure this is caused by something we did wrong, or some manual change, but I'm really hoping there's a way to get our stacks etc back into a consistent state with copilot.

Aside: I haven't tried this but supposedly you can update the system tags? https://stackoverflow.com/questions/60726450/aws-cli-s3api-put-bucket-tagging-cannot-add-tag-to-bucket-unless-bucket-has-0

ssyberg commented 9 months ago

Our approach here must be totally out of whack, I'm looking through the addon stacks for our three environments (develop, staging, and production) and they all reference the staging bucket for some reason? Maybe I'm just misunderstanding that storage addons are not environment specific? But I'm confused about why all three stacks have a different set of references and what the correct way to do this actually would be.

Is there some command or manual intervention we can do here to "resync"? These buckets already have 100s of gigs of production data so we really don't want to lose them.

Lou1415926 commented 9 months ago

There are a bunch of solutions! For context, logical ID is just used by CloudFormation to identify a resource within a stack: you can have a logical ID REDACTEDstagingassetsBucket in stack A, and the same logical ID REDACTEDstagingassetsBucket in stack B.

# In stack "prod" addon stack.
REDACTEDstagingassetsBucket: 
  Type: AWS:S3:Bucket
  Properties:
    BucketName: prod-asset

# In stack "dev" addon stack.
REDACTEDstagingassetsBucket:
  Type: AWS:S3:Bucket
  Properties:
    BucketName: dev-asset

⬆️ is allowed. You will end up with two buckets. Stack prod will associate the logical ID REDACTEDstagingassetsBucket with the bucket prod-asset, and stack dev will associate the same logical ID with another bucket dev-asset.

Now the solutions ⬇️

Change the logical IDs back to REDACTEDstagingassetsBucket

I guess you were seeing the issues because the logical ID REDACTEDstagingassetsBucket was changed to REDACTEDprodassetsBucket in prod stack, and to REDACTEDdevassetsBucket in dev stack. It is clearly a rename to you, but what CloudFormation sees is that: "I need to create a new bucket named prod-asset (which unfortunately already exists, so I error out), and associate that bucket with the logical ID REDACTEDprodassetsBucket; then I need to delete the bucket pointed to by REDACTEDstagingassetsBucket.".

Solution 1 is just to change the logical ID from REDACTEDprodassetsBucket back to REDACTEDstagingassetsBucket1. This solution is super easy, but I understand that it's weird to have a REDACTEDstagingassetsBucket1 in a prod stack.

Delete and recreate the bucket

If it is important for you to remain the bucket name (specified by Properties.BucketName) as "prod-asset", but whatever that's already in there is not that important (for example, maybe it contains a bunch of assets that you can easily upload again), then you can:

  1. Delete ⬇️ from prod addon
    REDACTEDstagingassetsBucket: 
    Type: AWS:S3:Bucket
    Properties:
    BucketName: prod-asset
  2. Run copilot env deploy. This will delete the "prod-asset" bucket.
  3. Add ⬇️ back to prod addon
    REDACTEDprodassetsBucket:  # Notice the logical ID is changed from xxxstaging to xxxprod.
    Type: AWS:S3:Bucket
    Properties:
    BucketName: prod-asset

    This will create a new s3 bucket named prod-asset and the logical ID is what you've wanted.

Retain and import the bucket

If the "prod-asset" is important to you, AND if you don't want to lose anything in the bucket, you can choose this option. This one is quite complicated, so I'd say go with second option if you can.

  1. Add DeletionPolicy: Retain under the existing bucket whose logical ID you've renamed to REDACTEDprodassetsBucket.
# In stack "prod"
REDACTEDstagingassetsBucket: 
  Type: AWS:S3:Bucket
  DeletionPolicy: Retain
  Properties:
    BucketName: prod-asset
  1. Run copilot env deploy
  2. Delete the REDACTEDstagingassetsBucket resource
  3. Run copilot env deploy. This will remove REDACTEDstagingassetsBucket from your CloudFormation stack, but the actual bucket will remain.
  4. Follow the steps here to bring the retained bucket back to the CloudFormation stack using the logical ID that you like.
ssyberg commented 9 months ago

⬆️ is allowed. You will end up with two buckets. Stack prod will associate the logical ID REDACTEDstagingassetsBucket with the bucket prod-asset, and stack dev will associate the same logical ID with another bucket dev-asset.

Ah thanks for clarifying, I guess in that case changing the tag isn't as important, but I would prefer to fix it because it's a bit misleading to devs.

Retain and import the bucket

Yea unfortunately the bucket already contains 60+ gigs / 500k+ images that we can't lose so I will try this strategy. Thank you!

ssyberg commented 9 months ago

And just to clarify, there's no way to just delete all the addon configs and then "import" the existing s3 resources into copilot configs?

ssyberg commented 9 months ago

Oh haha I see that's exactly what you linked me to!

ssyberg commented 9 months ago

Just want to add one thing here, I think I finally understood where I went wrong here, and that I only needed one storage addon and then I could deploy that single manifest to all three environments. What I did instead is I tried to create a separate addon for each environment :(

Lou1415926 commented 9 months ago

that I only needed one storage addon and then I could deploy that single manifest to all three environments.

Yeah this is exactly right - the intention is to encourage users to create similarly structured environments. You can use environment-ignostic names for logical IDs, e.g. REDACTEDassetsBucket 🚀

github-actions[bot] commented 7 months ago

This issue is stale because it has been open 60 days with no response activity. Remove the stale label, add a comment, or this will be closed in 14 days.

github-actions[bot] commented 7 months ago

This issue is closed due to inactivity. Feel free to reopen the issue if you have any further questions!