aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.56k stars 3.87k forks source link

API Gateway: Too Many Requests on API creation #15573

Open tuanardouin opened 3 years ago

tuanardouin commented 3 years ago

Hello,

When creating an API that contains a lot of endpoints, we reach the API Gateway limit on resource creation and get the error

Too Many Requests (Service: ApiGateway, Status Code: 429, ...

The limits : https://docs.amazonaws.cn/en_us/batch/latest/userguide/service_limits.html

Reproduction Steps

Create a REST API with a lot of resources.

What did you expect to happen?

I expected that CDK will consider this and have a 'sleep' between calls if necessary.

Right now I'm just commenting some of the nested stack that contains my ressources and unccoment them in batch.

Linked to this I think : https://github.com/aws-cloudformation/cloudformation-coverage-roadmap/issues/589

What actually happened?

Got the 429 error

Environment

This is :bug: Bug Report

nija-at commented 3 years ago

As far as I'm aware, the CDK does not invoke the API Gateway endpoint as part of its standard operation.

Please update the issue with details following this guidelines - http://sscce.org/

danmactough commented 3 years ago

As far as I'm aware, the CDK does not invoke the API Gateway endpoint as part of its standard operation.

Please update the issue with details following this guidelines - http://sscce.org/

@nija-at I'm pretty sure what @tuanardouin is reporting is not that CDK invokes the API Gateway endpoint directly but that when the CloudFormation template is deployed, the service-to-service communication between CloudFormation and API Gateway gets rate-limited and the CloudFormation deploy fails (and reflects that rate-limit error). It's unclear to the user that this is the source of the error they see. If I'm correct about the source of this error (and it's not possible for the user to get any more information), this is actually an upstream bug in CloudFormation (since CDK can't control the behavior of CloudFormation), and I imagine it would be SUPER-HELPFUL if the CDK team could bubble this bug up to the CloudFormation team. It's not the first time they've heard about this long-standing bug https://forums.aws.amazon.com/thread.jspa?threadID=100414, and they don't appear to have taken any steps to solve it.

nija-at commented 3 years ago

This will depend on the number of stacks being deployed in parallel for that account/region, number of API Gateway resources in each stack, custom resources that may be making calls directly to API gateway, etc.

If you have a specific CDK stack or CloudFormation resource that replicates this error consistently, I'll be happy to forward it to the relevant teams.

Otherwise, I would recommend contacting the AWS APIGateway team via AWS support for this issue.

danmactough commented 3 years ago

If you have a specific CDK stack or CloudFormation resource that replicates this error consistently, I'll be happy to forward it to the relevant teams.

@nija-at Oh, we definitely have example stacks where this happens consistently. We are a pretty small team, so there's usually max 1 stack being deployed at any time, and we don't have any custom resources on the stacks where this happens -- as you suggest, it is all about the number of API Gateway resources. But like I said, when we use CloudFormation to work with these resources, we have no ability to adapt to API Gateway rate limits for resources that CloudFormation is managing.

I'll be happy to forward it to the relevant teams.

This would be really helpful. I would be happy to work with the team (I think it would be CloudFormation) to help isolate this issue. Please let me know what you need from me. You can reach me by email at my GH username at gmail.

nija-at commented 3 years ago

Please let me know what you need from me

As mentioned, provide the simplest full CDK app that consistently replicates this issue.

peterwoodworth commented 3 years ago

@danmactough have you provided the requested information? Ping me if you'd like to reopen this issue.

github-actions[bot] commented 3 years ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

besh0y commented 2 years ago

Hello @peterwoodworth @nija-at I've been struggling with this issue for quite some time and I'd like to re-open it.

I've created this: repository, that consistently replicates the issue, please feel free to check it out.

The code creates a REST API with a large number of endpoints and methods, all created in multiple nested stacks. Deployment of those stacks always fails, returning a too many requests error, also rollback fails for the same reason.

I could temporarily avoid this problem by decreasing the number of resources in each nested stack and making them depend on each other during deployment so they don't get deployed in parallel, but it's much slower and inefficient.

jweyrich commented 2 years ago

Like someone already said, the problem is the rate limit of the API Gateway's own APIs. The CreateResource is limited to 5 per second per account.

We're facing the same problem with the Serveless Framework. Nobody solved it properly. AWS premium support suggests introducing DependsOn, but it's not a definitive solution for sure. The 2nd link below shows AWS published a private resource type Community::CloudFormation::Delay, which also doesn't feel like a definitive solution alone. We thought of using WaitCondition, but it's about the same. I believe AWS should be able to handle the throttling between its service calls transparently. The "user" is not making these service-to-service calls. IMHO, since the "user" is providing a valid template that could be fully deployed, if we ignore the rate limit for APIs that the "user" itself is not calling, it should work flawlessly. However, it may become a hard optimization problem to solve.

This issue is related to:

  1. https://github.com/aws-cloudformation/cloudformation-coverage-roadmap/issues/1095
  2. https://github.com/aws-cloudformation/cloudformation-coverage-roadmap/issues/589
tuanardouin commented 2 years ago

I encounter this problem only for the first deployment of a Stack, after that, it's not an issue anymore, unless I end up deploying a huge change. So, the dependsOn ended up solving the issue for me.

It's not a perfect solution nor does it excuse the origin of the problem, but at least it's not slowing down our deployments.

peterwoodworth commented 2 years ago

This type of issue will likely have to be fixed by either CloudFormation or ApiGateway to handle. I'm not a fan of any solutions like the potential new Delay construct to be used as a permanent solution, which is probably going to lie on ApiGateway to handle this correctly.

I would recommend opening an issue in the CloudFormation coverage roadmap repo so that they are aware of this specific issue, or opening an issue with premium support if you have it

jweyrich commented 2 years ago

@peterwoodworth they're aware. My previous comment contains a link to the CloudFormation roadmap issue (see here). And the Premium Support has an article How do I prevent "Rate exceeded" errors in CloudFormation? in their Knowledge Center.

oanhhuynhpositive commented 1 year ago

Hi @peterwoodworth any specific Idea on how to resolve this problem temporary ?

jweyrich commented 1 year ago

@oanhhuynhpositive A coworker wrote this plugin for Serverless v2/v3 that uses a simple graph algorithm to solve the dependency tree. Does not generate the most performant tree, but works fine. Here is is if you want to give it a try: https://github.com/AlexsandroBezerra/serverless-custom-depends-on

oanhhuynhpositive commented 1 year ago

@jweyrich Thanks, actually i'm looking for a solution when I use cdk to deploy stack resources.

jweyrich commented 1 year ago

@oanhhuynhpositive oh, my bad. I mixed both repos (cdk and serverless) as we've been dealing with the same issue.

chessbyte commented 1 year ago

@nija-at A CDK stack that reliably reproduces this issue was provided here. Is there any update on when this will be fixed in CloudFormation? If the CloudFormation deployment code were open-source, I would put in a PR myself to retry (with exponential backoff) on a 429 error. Conceptually, it seems quite straightforward. I am not sure why AWS is not really responding to this issue, as many people are facing it daily.

nija-at commented 1 year ago

@chessbyte unfortunately I no longer work for AWS so I'm unable to answer any of your questions.

chessbyte commented 1 year ago

@nija-at wishing you well!

peterwoodworth commented 1 year ago

We're not the CloudFormation team, so we cannot answer these questions. There's no action CDK can take here with our construct library - While this bug persists, it will be up to customers to configure dependencies between the resources they create to ensure they deploy sequentially rather than in parallel. See this comment for an example

I've created a ticket internally to make sure the right team sees this. I'll provide updates when they become available P88246032

bouwerp commented 3 months ago

It has been more than a year - has there been any movement on this?

tuanardouin commented 3 months ago

@bouwerp No change and still a problem on CDK 2. Our legacy code still has this issue, but we don't deploy new CloudFormation often, so we just swept that under the rug.

We started using Terraform partly because of that and didn't encounter this problem.

bouwerp commented 3 months ago

Thanks for the info @tuanardouin. I have been looking for a reason to move to terraform.

erjenkins29 commented 3 months ago

Same issue here -- but suddenly started working after about half an hour and moving to a separate api gateway instance....

danielMiron commented 1 week ago

same issue, using latest CDK 2 version this has become a major issue for us, we have stacks with many API routes and integrations, and we deploy them as part of our CI, deploys used to work for us most of the time but failed everyone in a while, but since yesterday it seems impossible to deploy, on different APIs even different AWS accounts CDK is creating one temple with all routes and there is nothing we can do to prevent it from all being deployed at once. did anyone find a workaround? right now this completely breaks our workflow and might be the reason we will give up on CDK altogether

tuanardouin commented 1 week ago

@danielMiron are you encountering this on new deployments or for already available APIs ?

danielMiron commented 1 week ago

@danielMiron are you encountering this on new deployments or for already available APIs ?

new deployments

tuanardouin commented 1 week ago

@danielMiron A quick hack is to comment half your endpoints and then deploy your stack. Once it's done, you can uncomment the rest and deploy again.

danielMiron commented 3 days ago

@tuanardouin This is what we ended up doing to deploy locally, but it doesn't help us much with CI