aws-amplify / amplify-category-api

The AWS Amplify CLI is a toolchain for simplifying serverless web and mobile development. This plugin provides functionality for the API category, allowing for the creation and management of GraphQL and REST based backends for your amplify project.
https://docs.amplify.aws/
Apache License 2.0
89 stars 76 forks source link

Stack ended up in UPDATE_ROLLBACK_COMPLETE state after a timeout during CI/CD build #1788

Open parvusville opened 1 year ago

parvusville commented 1 year ago

How did you install the Amplify CLI?

No response

If applicable, what version of Node.js are you using?

No response

Amplify CLI Version

12.2.3

What operating system are you using?

Pop Os

Did you make any manual changes to the cloud resources managed by Amplify? Please describe the changes made.

No

Describe the bug

TLDR; I merged lots of changes from dev to staging environment, which caused us to run into the default 30minute timeout with the Amplify Hosting CI/CD build, and now we are unable to deploy anything. I am also quite hesitant of pushing these changes into production, before I know how to resolve this particular issue. Though I hope the push would be just fine by using a longer timeout than the default 30 minutes.

The changes included:

Since the build ran for 30 minutes before timing out, it did successfully create lots of stuff, including the new API Gateway with the Lambda, AND all of the GSI:s. So now I am in a state where part of the updates are deployed into the cloud, but the CLI of course does not know of them. Not sure if this is part of the problem and if I should do something about it. For example remove created stuff manually..?

Now we are facing this message when trying to push.

Rolled back (2 of 1)
🛑 ["Index: 0 State: {\"deploy\":\"waitingForDeployment\"} Message: Resource is not in the state stackUpdateComplete"]

The API stack (old existing GraphQL API, where I added the new @key:s to) always ends up in UPDATE_ROLLBACK_COMPLETE state.

I will include all the steps taken in the Reproduction steps section. Below is what I have found from the affected Cloudformation Stack (GraphQL API one, amplify-myapp-staging-165805-apimyapp1MEX5K4GYEIYQ): image image image image

And here is a sample from the part that fails on the Review table stack (amplify-myapp-staging-165805-apimyapp-1MEX5K4GYEIYQ-Review-1BC3241C95LS4): image

As far as I understand, this seems to be the key thing blocking the push at least for now:

Resource handler returned message: "Cannot perform more than one GSI creation or deletion in a single update" (RequestToken: eff8451a-4e7b-a0f3-92d5-bf1ff0b267c2, HandlerErrorCode: InvalidRequest)

I also see similar error on the User table's stack, to which I also added new indexes. However with other Tables I also added indexes to, I only see Resource update cancelled as the reason for the "UPDATE_FAILED".

The results are pretty much the same for the later deployments, and the stack always end up in UPDATE_ROLLBACK_COMPLETE -state. image

Expected behavior

When a deployment halts due to whatever reason, we should have a way out afterwards.

Reproduction steps

  1. Add lot of backend changes. For me it was: Create new API Gateway with a Lambda, and around 10 new GSI:s (@key:s)
  2. Have Amplify hosting git deployments enabled with quite short build timeout time (for me 30 minutes, which was not enough in this case)
  3. Commit and push changes to git do deploy the CI/CD build, and wait for it to time out
  4. Try re-deploying in Hosting 2023-08-08T08:12:31.155Z [INFO]: 🛑 Cannot iteratively rollback as the following step does not contain a previousMetaKey: {"status":"DEPLOYING"}
  5. Remove deployment-state.json file from the S3 deployment bucket (I also have this one downloaded, if it is any help?)
  6. Try re-deploying once again in Hosting
    
    2023-08-08T08:25:19.630Z [INFO]: UPDATE_FAILED      UserTable AWS::DynamoDB::Table Tue Aug 08 2023 08:25:17 GMT+0000 (Coordinated Universal Time) Resource handler returned message: "Cannot perform more than one GSI creation or deletion in a single update" (RequestToken: 2a73b46d-80e0-5ce1-bbf9-07df265296e8, HandlerErrorCode: InvalidRequest)
    ...
    2023-08-08T08:25:39.907Z [INFO]: 
    2023-08-08T08:25:39.909Z [INFO]: UPDATE_FAILED      QuerylistNotificationsByGroupByCreatedAtResolver        AWS::AppSync::Resolver Tue Aug 08 2023 08:25:31 GMT+0000 (Coordinated Universal Time) Resource update cancelled
                                 UPDATE_FAILED      QuerylistNotificationsByBusinessStatusCreatedAtResolver AWS::AppSync::Resolver Tue Aug 08 2023 08:25:31 GMT+0000 (Coordinated Universal Time) Resource update cancelled
    ...
    2023-08-08T08:26:05.245Z [INFO]: UPDATE_FAILED      ReviewTable AWS::DynamoDB::Table Tue Aug 08 2023 08:25:16 GMT+0000 (Coordinated Universal Time) Resource handler returned message: "Cannot perform more than one GSI creation or deletion in a single update" (RequestToken: 8bccce9e-69a1-de56-682f-46b024f48b2c, HandlerErrorCode: InvalidRequest)
    ... 

2023-08-08T08:27:04.100Z [INFO]: Rolled back (2 of 1) 2023-08-08T08:27:04.160Z [WARNING]: ✖ There was an error initializing your environment. 2023-08-08T08:27:04.161Z [INFO]: 🛑 ["Index: 0 State: {\"deploy\":\"waitingForDeployment\"} Message: Resource is not in the state stackUpdateComplete"] ... 2023-08-08T08:27:04.203Z [INFO]: Session Identifier: e7ddd0ef-b014-486a-b61f-2f274cdc6af5

7. Try pushing locally
Same result:

Rolled back (2 of 1) 🛑 ["Index: 0 State: {\"deploy\":\"waitingForDeployment\"} Message: Resource is not in the state stackUpdateComplete"]

8. Try once again removing the deployment-state.json, and then run `amplify push --iterative-rollback`
Same result

### Project Identifier

6b0d8978f235dda0a7c441c51bcf218f

### Log output

<details>

Put your logs below this line



</details>

### Additional information

I'm using Transformer V1.

### Before submitting, please confirm:

- [X] I have done my best to include a minimal, self-contained set of instructions for consistently reproducing the issue.
- [X] I have removed any sensitive information from my code snippets and submission.
parvusville commented 1 year ago

I also tried instructions from this comment with no luck. https://github.com/aws-amplify/amplify-category-api/issues/1425#issuecomment-1546148219

Pre-push status showed no changes to the GraphQL API, but it still tried to deploy something.

This time console gives me more clear message after amplify push fails, message is fundamentally same as before. Just as a reminder, when I inspect these DynamoDB tables in cloud, they all have the new indexes from the original push that timed out.

🛑 The following resources failed to deploy:
Resource Name: UserTable (AWS::DynamoDB::Table)
Event Type: update
Reason: Resource handler returned message: "Cannot perform more than one GSI creation or deletion in a single update" (RequestToken: 65488e9b-5ba4-7a6e-df31-db7bd77d5178, HandlerErrorCode: InvalidRequest)
URL: xxx

Resource Name: ReviewTable (AWS::DynamoDB::Table)
Event Type: update
Reason: Resource handler returned message: "Cannot perform more than one GSI creation or deletion in a single update" (RequestToken: ca464380-097d-284f-0ec8-22d09678f0f0, HandlerErrorCode: InvalidRequest)
URL: xxx

Resource Name: PricingExceptionTable (AWS::DynamoDB::Table)
Event Type: update
Reason: Resource handler returned message: "Cannot perform more than one GSI creation or deletion in a single update" (RequestToken: 1c3bbe8c-ece7-23db-c470-1069b8908926, HandlerErrorCode: InvalidRequest)
URL: xxx

🛑 Resource is not in the state stackUpdateComplete
Name: UserTable (AWS::DynamoDB::Table), Event Type: update, Reason: Resource handler returned message: "Cannot perform more than one GSI creation or deletion in a single update" (RequestToken: 65488e9b-5ba4-7a6e-df31-db7bd77d5178, HandlerErrorCode: InvalidRequest), IsCustomResource: false

Name: ReviewTable (AWS::DynamoDB::Table), Event Type: update, Reason: Resource handler returned message: "Cannot perform more than one GSI creation or deletion in a single update" (RequestToken: ca464380-097d-284f-0ec8-22d09678f0f0, HandlerErrorCode: InvalidRequest), IsCustomResource: false

Name: PricingExceptionTable (AWS::DynamoDB::Table), Event Type: update, Reason: Resource handler returned message: "Cannot perform more than one GSI creation or deletion in a single update" (RequestToken: 1c3bbe8c-ece7-23db-c470-1069b8908926, HandlerErrorCode: InvalidRequest), IsCustomResource: false
ykethan commented 1 year ago

Hey, 👋 thanks for raising this! I'm going to transfer this over to our API repository for better assistance 🙂.

parvusville commented 1 year ago

One more update to the previous. I tried manually removing the new indexes (in the case of this table leaving no indexes) from PricingException in the DDB Console to see if it makes any difference, but the same error still persists. This was also done in the similar state than the previous comment: after amplify pull, and I just added one change to the schema (a field: test: String ).

Resource Name: PricingExceptionTable (AWS::DynamoDB::Table)
Event Type: update
Reason: Resource handler returned message: "Cannot perform more than one GSI creation or deletion in a single update" (RequestToken: 2db3d1b2-5baa-fb48-a91d-419ff9c52aef, HandlerErrorCode: InvalidRequest)