aws-amplify / amplify-category-api

The AWS Amplify CLI is a toolchain for simplifying serverless web and mobile development. This plugin provides functionality for the API category, allowing for the creation and management of GraphQL and REST based backends for your amplify project.
https://docs.amplify.aws/
Apache License 2.0
89 stars 78 forks source link

API can get stuck with error: Limit on the number of resources in a single stack operation exceeded #2432

Open dpilch opened 7 months ago

dpilch commented 7 months ago

How did you install the Amplify CLI?

npm

If applicable, what version of Node.js are you using?

v18.19.1

Amplify CLI Version

12.10.3

What operating system are you using?

Mac

Did you make any manual changes to the cloud resources managed by Amplify? Please describe the changes made.

No

Describe the bug

The Amplify GraphQL API can reach a state where it is possibly not recoverable. A single CloudFormation deployment can only touch 2,500 resources. See Maximum number of CloudFormation resources a nested stack can create, update, or delete per operation. at https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cloudformation-limits.html. This limit is a hard cap and cannot be modified. This is different than the 500 resource limit per stack.

Due to a design flaw in the GraphQL API construct, every resource in the API will be touched with a no-op operation. The no-op operations will contribute to the 2,500 limit. It is possible to reach a state where it is impossible to make any modification to a GraphQL API because adding or removing resources will cross the 2,500 limit. We have seen this error in the ConnectionStack but it is not confirmed if this is the only stack that will give this error.

It is not possible for a customer to identify how close they are to the limit. The information is contained within an internal log in CloudFormation. If a customer needs to identify the current account they will need to open a technical support ticket through AWS.

At this time we have received one report of a customer reaching this state and were not able to recover. It is not clear how the CloudFormation was originally deployed be above this limit.

Expected behavior

It will not be possible to remove this limit, but there a several options to improve this experience.

  1. Fix GraphQL API construct to not perform no-op on all resources on all deployments.
  2. Provide a warning when the API is approaching the limit.
  3. Fail before deploying the CloudFormation templates when exceeding/close to the limit if this can be identified locally.
  4. Provide an automated recovery tool.

Reproduction steps

We have not created a reliable reproduction at this time. The steps to repro is likely:

  1. Create a very large schema that gives the error.
  2. Modify the schema until a successful deployment at the 2,500 resource limit.
  3. Attempt to remove or add any resource from the API.
  4. amplify push and see the error.

Project Identifier

No response

Log output

``` # Put your logs below this line ```

Additional information

If no modification can be made through amplify push it is still possible to recover some APIs. This is done by manually modifying a child stack for a given model.

  1. Identify a model that does not have connections.
  2. Remove all queries and mutations from this model with @model(queries: null, mutations: null)
    1. This will remove the resolvers for this model and possibly lower the number of operations below the limit.
  3. amplify api gql-compile
  4. Open the CloudFormation console and locate the stack that corresponds with the model that was modified.
  5. Select Update
    1. Select Update nested stack and Update stack
    2. Select Replace existing template
    3. Select Upload a template file
    4. Upload the template file from your local project amplify/backend/api/<api-name>/build/stacks/<model-name>.json
    5. Use the defaults for the next two pages.
    6. Before selecting Submit ensure the Change set preview shows the resolvers being removed.
  6. After this deployment is successful attempt amplify push.
    1. If the manual CFN deployment is not successful the stack may be in a broken state. If the model stack reaches the state UPDATE_ROLLBACK_FAILED you will need to open technical support ticket to CloudFormation. Please state that the root stack is in a healthy state but a child stack is in the UPDATE_ROLLBACK_FAILED state.
  7. If the amplify push is successful begin to remove resources in waves until you can successfully push an update with stack mapping. https://docs.amplify.aws/javascript/build-a-backend/graphqlapi/modify-amplify-generated-resources/#place-appsync-resolvers-in-custom-named-stacks

Before submitting, please confirm:

MarlonJD commented 5 months ago

Do you have solution for gen 2 for same error?

dpilch commented 5 months ago

Unfortunately, it is not possible to remove queries and mutations in Gen 2 at this time.

MarlonJD commented 5 months ago

Hey @dpilch, I've got the same issue, stack mapping couldn't help to solve limit issue for now, I'm trying to found solution, if gen1 has queries null, it could be temp solution, I'll try to pass this null parameter to gen 2.

I found really weird solution for limit issue, start with small data models, and push without error, then make bigger and push, then again, you'll deploy in 4-5 parts if you have 50-60 models. It's weird but it works! But you'll need to remove stacks on amplify side for push. If this is emergency for you, I'll try to explain detailed

AnilMaktala commented 3 months ago

Workaround: Since function directives are non-destructive, customer can delete all the function directives from the stack and re-add them with stack mapping.

Steps:

  1. Remove the custom mutations and queries that contain a function directive by commenting out the lines in the schema, as shown below, to make them easy to re-add later:

    ```
           # type Mutation {
          #   createEntry(arg: String): String @function(name: "functionName1-${env}")
          #   ...
          # }
    
          # type Query {
          #   calculateTax(...): String @function(name: "calculateTax-${env}")
          #   ...
          # }
    ```
  2. Run amplify push -y and wait for the deployment to complete.
  3. Re-add the queries and mutations with the function directive. (Important: DO NOT initiate the deployment at this time).
  4. Move the resolvers to a custom stack.
    • For example: The logicial ID of the createEntry mutation will be MutationcreateEntryResolver. Add this logical ID to the stack mapping in transform.conf.json file. (Important: The logical ID is case sensitive)
    • Map as many resolvers as possible to a new function directive stack.
{

            "Version": 5,
            "ElasticsearchWarning": true,
            "StackMapping": {
            "MutationcreateEntryResolver": "FunctionDirectiveStack1", // Use any meaningful name, it doesn't have to be 'FunctionDirectiveStack1'
            "...": "FunctionDirectiveStack1",
            "...": "FunctionDirectiveStack2",
            }
            } 
  1. Once all the resolvers are mapped to a custom stack, run amplify api gql-compile.
  2. Verify the custom stacks are created and the resolvers exists in the custom stack. The stacks are placed under <project_root>/backend/api/<api_name>/build/stacks directory.
  3. Run amplify push -y and wait for the deployment to be complete. At this point, the deployment should go through successfully and the resolvers should be mapped to a custom stack.