aws-amplify / amplify-category-api

The AWS Amplify CLI is a toolchain for simplifying serverless web and mobile development. This plugin provides functionality for the API category, allowing for the creation and management of GraphQL and REST based backends for your amplify project.
https://docs.amplify.aws/
Apache License 2.0
87 stars 73 forks source link

Amplify push hangs indefinitely after failed cloudformation updates #2305

Open d-huck opened 5 months ago

d-huck commented 5 months ago

How did you install the Amplify CLI?

npm

If applicable, what version of Node.js are you using?

21.2.0

Amplify CLI Version

12.10.1

What operating system are you using?

MacOS

Did you make any manual changes to the cloud resources managed by Amplify? Please describe the changes made.

The only manual changes are a few custom images for running lambdas online. Based on other issues similar to mine, I attempted to make dummy resolvers to clear the UPDATE_ROLLBACK_FAILED state to no avail. These dummy resolvers have been removed.

Describe the bug

When pushing updates from our dev environment to production, API building failed due to an object being removed from the API. The API push has an unfortunately large number of changes due to our frontend devlopment lagging far behind. The commands used for pushing to main were:

amplify env checkout main
amplify status
amplify push -y --allow-destructive-graphql-schema-updates

Which failed after roughly 30 minutes of the CLI doing its thing. The result is our cloudformation stack is in UPDATE_ROLLBACK_FAILED and cannot be cleared out of this state.

After attempting to rollback, I have reverted the schema to the last known working state and pushed, which results in the behavior of an indefinite hang of the CLI. The last known working state does not remove the object in question. To further debug this, I pulled the main environment down using amplify pull, added a comment line to trigger a rebuild, and experience the same behavior where the CLI hangs and does not move forward. Our production environment has been offline for 12 hours now, which is generally considered to be a bad thing.

Expected behavior

Pushing changes from environment to another should work or at least leave things in a revertible state..

Reproduction steps

Not sure if this can be reproduced in an empty directory, I have never experienced this level of amplify failing before.

  1. Create a project with interconnected schema in main environment, push to cloud
  2. Create dev environment, update schema with index changes and remove an object
  3. Checkout main environment and attempt pushing to cloud.
  4. Suffer.

Project Identifier

a7f88f1c8eb39da02933e54e978f3c1e

Log output

``` # Put your logs below this line ```

Additional information

024-02-28T15:11:01.010Z|info : amplify-provider-awscloudformation.upload-appsync-files.uploadAppSyncFiles.upload.s3Client.uploadFile([{"Key":"[***]cks/[***]json"}])
2024-02-28T15:11:01.010Z|info : amplify-provider-awscloudformation.aws-s3.uploadFile.s3.putObject([{"Key":"[***]ify-[***]ync-[***]es/[***]c910f56ba137798c/[***]cks/[***]json","Body":{"fd":null,"path":"/Users/dhuck/projects/vibe/vcheck-backend/amplify/backend/api/vibevideo/build/stacks/Song.json","flags":"r","mode":438,"end":null,"bytesRead":0,"_readableState":{"objectMode":false,"highWaterMark":65536,"buffer":{"head":null,"tail":null,"length":0},"length":0,"pipes":[],"flowing":null,"ended":false,"endEmitted":false,"reading":false,"constructed":false,"sync":true,"needReadable":false,"emittedReadable":false,"readableListening":false,"resumeScheduled":false,"errorEmitted":false,"emitClose":true,"autoDestroy":true,"destroyed":false,"errored":null,"closed":false,"closeEmitted":false,"defaultEncoding":"utf8","awaitDrainWriters":null,"multiAwaitDrain":false,"readingMore":false,"dataEmitted":false,"decoder":null,"encoding":null},"_events":{},"_eventsCount":1},"Bucket":"[***]ify-[***]ideo-[***]in-[***]331-[***]ment"}])
2024-02-28T15:11:01.010Z|info : amplify-provider-awscloudformation.upload-appsync-files.uploadAppSyncFiles.upload.s3Client.uploadFile([{"Key":"[***]cks/[***]json"}])
2024-02-28T15:11:01.010Z|info : amplify-provider-awscloudformation.aws-s3.uploadFile.s3.putObject([{"Key":"[***]ify-[***]ync-[***]es/[***]c910f56ba137798c/[***]cks/[***]json","Body":{"fd":null,"path":"/Users/dhuck/projects/vibe/vcheck-backend/amplify/backend/api/vibevideo/build/stacks/User.json","flags":"r","mode":438,"end":null,"bytesRead":0,"_readableState":{"objectMode":false,"highWaterMark":65536,"buffer":{"head":null,"tail":null,"length":0},"length":0,"pipes":[],"flowing":null,"ended":false,"endEmitted":false,"reading":false,"constructed":false,"sync":true,"needReadable":false,"emittedReadable":false,"readableListening":false,"resumeScheduled":false,"errorEmitted":false,"emitClose":true,"autoDestroy":true,"destroyed":false,"errored":null,"closed":false,"closeEmitted":false,"defaultEncoding":"utf8","awaitDrainWriters":null,"multiAwaitDrain":false,"readingMore":false,"dataEmitted":false,"decoder":null,"encoding":null},"_events":{},"_eventsCount":1},"Bucket":"[***]ify-[***]ideo-[***]in-[***]331-[***]ment"}])
2024-02-28T15:11:01.010Z|info : amplify-provider-awscloudformation.upload-appsync-files.uploadAppSyncFiles.upload.s3Client.uploadFile([{"Key":"[***]cks/[***]json"}])
2024-02-28T15:11:01.010Z|info : amplify-provider-awscloudformation.aws-s3.uploadFile.s3.putObject([{"Key":"[***]ify-[***]ync-[***]es/[***]c910f56ba137798c/[***]cks/[***]json","Body":{"fd":null,"path":"/Users/dhuck/projects/vibe/vcheck-backend/amplify/backend/api/vibevideo/build/stacks/Vibe.json","flags":"r","mode":438,"end":null,"bytesRead":0,"_readableState":{"objectMode":false,"highWaterMark":65536,"buffer":{"head":null,"tail":null,"length":0},"length":0,"pipes":[],"flowing":null,"ended":false,"endEmitted":false,"reading":false,"constructed":false,"sync":true,"needReadable":false,"emittedReadable":false,"readableListening":false,"resumeScheduled":false,"errorEmitted":false,"emitClose":true,"autoDestroy":true,"destroyed":false,"errored":null,"closed":false,"closeEmitted":false,"defaultEncoding":"utf8","awaitDrainWriters":null,"multiAwaitDrain":false,"readingMore":false,"dataEmitted":false,"decoder":null,"encoding":null},"_events":{},"_eventsCount":1},"Bucket":"[***]ify-[***]ideo-[***]in-[***]331-[***]ment"}])
2024-02-28T15:11:02.883Z|info : amplify-provider-awscloudformation.push-resources.uploadTemplateToS3.s3.uploadFile([{"Key":"[***]ify-[***]fn-[***]ates/[***]pi/[***]mation-[***]e.json"}])
2024-02-28T15:11:02.884Z|info : amplify-provider-awscloudformation.aws-s3.uploadFile.s3.putObject([{"Body":{"fd":null,"path":"/Users/dhuck/projects/vibe/vcheck-backend/amplify/backend/awscloudformation/build/api/vibevideo/build/cloudformation-template.json","flags":"r","mode":438,"end":null,"bytesRead":0,"_readableState":{"objectMode":false,"highWaterMark":65536,"buffer":{"head":null,"tail":null,"length":0},"length":0,"pipes":[],"flowing":null,"ended":false,"endEmitted":false,"reading":false,"constructed":false,"sync":true,"needReadable":false,"emittedReadable":false,"readableListening":false,"resumeScheduled":false,"errorEmitted":false,"emitClose":true,"autoDestroy":true,"destroyed":false,"errored":null,"closed":false,"closeEmitted":false,"defaultEncoding":"utf8","awaitDrainWriters":null,"multiAwaitDrain":false,"readingMore":false,"dataEmitted":false,"decoder":null,"encoding":null},"_events":{},"_eventsCount":1},"Key":"[***]ify-[***]fn-[***]ates/[***]pi/[***]mation-[***]e.json","Bucket":"[***]ify-[***]ideo-[***]in-[***]331-[***]ment"}])
2024-02-28T15:11:04.135Z|info : amplify-provider-awscloudformation.system-config-manager.getProfileConfig(["vibe"])
2024-02-28T15:11:04.137Z|info : amplify-provider-awscloudformation.system-config-manager.getProfiledAwsConfig.profileConfig([{"region":"us-east-1"}])
2024-02-28T15:11:04.137Z|info : amplify-provider-awscloudformation.system-config-manager.getProfileCredentials(["vibe"])
2024-02-28T15:11:04.140Z|info : amplify-provider-awscloudformation.aws-cfn.updateCloudFormationNestedStack(["/Users/dhuck/projects/vibe/vcheck-backend/amplify/backend/awscloudformation","/Users/dhuck/projects/vibe/vcheck-backend/amplify/backend/awscloudformation/build/awscloudformation/build/root-cloudformation-stack.json"])
2024-02-28T15:11:04.149Z|info : amplify-provider-awscloudformation.aws-cfn.updateResourceStack.s3.uploadFile([{}])
2024-02-28T15:11:04.149Z|info : amplify-provider-awscloudformation.aws-s3.uploadFile.s3.putObject([{"Body":{"fd":null,"path":"/Users/dhuck/projects/vibe/vcheck-backend/amplify/backend/awscloudformation/build/awscloudformation/build/[***]ot-[***]mation-[***]json","flags":"r","mode":438,"end":null,"bytesRead":0,"_readableState":{"objectMode":false,"highWaterMark":65536,"buffer":{"head":null,"tail":null,"length":0},"length":0,"pipes":[],"flowing":null,"ended":false,"endEmitted":false,"reading":false,"constructed":false,"sync":true,"needReadable":false,"emittedReadable":false,"readableListening":false,"resumeScheduled":false,"errorEmitted":false,"emitClose":true,"autoDestroy":true,"destroyed":false,"errored":null,"closed":false,"closeEmitted":false,"defaultEncoding":"utf8","awaitDrainWriters":null,"multiAwaitDrain":false,"readingMore":false,"dataEmitted":false,"decoder":null,"encoding":null},"_events":{},"_eventsCount":1},"Key":"root-cloudformation-stack.json","Bucket":"[***]ify-[***]ideo-[***]in-[***]331-[***]ment"}])
2024-02-28T15:11:04.814Z|info : amplify-provider-awscloudformation.aws-cfn.updateResourceStack.describeStack([{"StackName":"[***]ify-[***]ideo-[***]in-[***]331"}])
2024-02-28T15:11:04.817Z|info : amplify-provider-awscloudformation.aws-cfn.describeStack.cfn.describeStacks([{"StackName":"[***]ify-[***]ideo-[***]in-[***]331"}])
2024-02-28T15:11:05.143Z|info : amplify-provider-awscloudformation.aws-cfn.updateResourceStack.updateStack([{"StackName":"[***]ify-[***]ideo-[***]in-[***]331"}])
2024-02-28T15:18:33.437Z|info : amplify version core
2024-02-28T15:28:22.136Z|info : amplify diagnose core  {"send-report":true,"yes":false}
2024-02-28T15:28:22.221Z|info : @aws-amplify/amplify-cli-core.banner-message/index.ts.fetch banner messages from https://aws-amplify.github.io/amplify-cli/banner-message.json({}

Before submitting, please confirm:

ykethan commented 5 months ago

Hey @d-huck, 👋 thanks for raising this! From the logs provided it appear this occurring when pushing a API resource. I'm going to transfer this over to our API repository for better assistance. But wanted to mention, you may need to remove the deployment-state.json file in the S3 deployment if present. On the amplify push command could you try adding a --debug for verbose logging.

d-huck commented 5 months ago

@ykethan Thank you for the response. I had already attempted removing delpoyment-state.json after making the original post to no avail. Running with debug gives the following message:

 Stack:arn:aws:cloudformation:us-east-1:xxxxxxxxxxx:stack/amplify-vxxxo-main-130331/245af600-74f3-11ee-9f3e-0a3e5b9c2ce5 is in UPDATE_ROLLBACK_FAILED state and can not be updated.
PushResourcesFault: Stack:arn:aws:cloudformation:us-east-1:xxxxxxxxxx:stack/amplify-vxxxo-main-130331/245af600-74f3-11ee-9f3e-0a3e5b9c2ce5 is in UPDATE_ROLLBACK_FAILED state and can not be updated.
    at AmplifyToolkit.pushResources (/snapshot/amplify-cli/build/node_modules/@aws-amplify/cli-internal/lib/extensions/amplify-helpers/push-resources.js:116:23)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Object.executeAmplifyCommand (/snapshot/amplify-cli/build/node_modules/@aws-amplify/amplify-category-api/lib/index.js:231:9)
    at async executePluginModuleCommand (/snapshot/amplify-cli/build/node_modules/@aws-amplify/cli-internal/lib/execution-manager.js:139:5)

before it resumes and hangs indefinitely. Attempting to continue rollback in cloudformation results in the same failure. Diving down the stack in CloudFormation, I see this error message:

The following resource(s) failed to update: [SubscriptiononDeleteUserResolver, UserdiscordIdResolver, DeleteUserResolver, UserownerResolver, CreateUserResolver, QuerygetUserByDiscordIdResolver, SubscriptiononUpdateUserResolver, SubscriptiononCreateUserResolver, UseremailResolver, GetUserResolver, ListUserResolver, UserphoneResolver, UpdateUserResolver].

which contains many of the resolvers for the object type I initially attempted to remove and am now attempting to restore. However, I can't push anything until this rollback completes, which is seemingly a Catch-22

dpilch commented 5 months ago

Creating fake resolvers may allow you to get out of the UPDATE_ROLLBACK_FAILED state. https://github.com/aws-amplify/amplify-category-api/issues/2157#issuecomment-1868341419

d-huck commented 5 months ago

Thanks for the link. I followed #2157 and was able to get past that into UPDATE_ROLLBACK_COMPLETE on the stack. There was some more strange behavior following this. First, it gave me the "Cannot perform more than one GSI creation or deletion in a single update". I removed the index tag from the table I'm trying to remove as well as deleted the index from the DynamoDB console. After moving past that I'm just stuck with an infinite

🛑 ["Index: 0 State: {\"deploy\":\"waitingForDeployment\"} Message: Resource is not in the state stackUpdateComplete"]

I've tried deleting the deployment-state from the deployment s3 bucket. But I'm not able to move forward with the last known good schema.

dpilch commented 5 months ago

Are there any additional error messages on the CloudFormation console?

dpilch commented 5 months ago

There are a few possible solutions in https://github.com/aws-amplify/amplify-category-api/issues/92.

d-huck commented 5 months ago

@dpilch Thank you for the link. I had been up and down that thread and was hoping for solutions other than the ones that were proposed there. In the end, we ended up destroying the api and associated tables and rebuilt them fresh, which wasn't as catastrophic as it could have been considering we're at a very early stage. It seems like this problem is common among people who are making large, quick changes to their backend, so hopefully we won't be facing this in the future. I'll spare y'all the rant about this being unacceptable, because I assume y'all have read #92 in detail.

I'll leave our solution for anyone who may find themselves on this page in the future. First, if you've tried the normal things, don't hold out for a solution, just follow #92. Here's our resolution steps:

  1. Backup all the relevant tables in the environment.
  2. If any resources have the API as a dependency, remove them. For example, for a lambda, run amplify function update, select the function and ensure the API is unselected.
  3. Backup the schema.graphql file, because the next step will remove it.
  4. Remove the api: amplify remove api
  5. Check the environment's S3 development bucket and delete deployment-state.json if it exists.
  6. Push changes: amplify push
  7. Rebuild the api: ampliy add api. Select blank or template, you'll overwrite it in the next step
  8. Restore the backed up schema.graphql
  9. Push changes: amplify push
  10. While this is happening, restore the DynamoDB table backups as new tables with easy to remember names
  11. Once the api is pushed, use this tool to copy data from the backup tables to your new tables. There's probably a more idiomatic approach for backing up and restoring the data, but we were already aware of this tool and wanted to be done with it. Be sure to delete your back up tables when you're done.

Outside of the backup and restoration process, this whole process takes roughly 30 minutes.

re-oka commented 5 months ago

I am sharing this because a similar event has occurred.

[ Problem ]

  1. in modifying schema.graphql, I added 5 GSIs for one table at the same time. (previous state was 0 GSI)
  2. amplify push
  3. when 3 GSIs were added, Cannot perform more than one GSI creation or deletion in a single update occurred and deploy failed.
  4. Amplify rollbacked resources and got a successful rollback status.
  5. The rollback succeeded, but when I checked the CloudFormation Stack for the target table, it was still a template with 3 GSIs remaining. (GSI's were not rolled back)

[ after problem ] Try amplify push, but on initial-state deployment, I got Cannot perform more than one GSI creation or deletion in a single update.

[ Cause ] The error was caused by the following difference.

As a result, there was a difference of more than 2 between the number of GSIs in the CloudFormation Stack and the number of GSIs in the CloudFormation Template that Amplify first attempts to deploy.

[ Recovery ]

  1. download #current-cloud-backend.zip under the s3 bucket (amplify-appid-envname-xxxxx-deployment) (although Amplify officially forbids modification)
  2. unzip #current-cloud-backend.zip
  3. Update GlobalSecondaryIndexes and AttributeDefinitions in api/apiid/build/stacks/<tablename>.json to the CloudFormation Stack contents.(3 GSI)
  4. in api/apiid/schema.graphql, set the target table's @index(...) to match the 3 GSIs in the CloudFormation Stack. 5.cd #current-cloud-backend. 6.zip -r ... /#current-cloud-backend.zip *
  5. Uploaded the created zip to s3 and redeployed from the Amplify Web Console (If you are not using Amplify Web Console, you may want to change your local amplify/#current-cloud-backend/)

Amplify CLI version : 12.8.2