Closed sfxandy closed 2 months ago
Hi @sfxandy,
This is indeed an interesting problem. By default the SDK does retry various rate limit exceptions:
var DefaultThrottleErrorCodes = map[string]struct{}{
"Throttling": {},
"ThrottlingException": {},
"ThrottledException": {},
"RequestThrottledException": {},
"TooManyRequestsException": {},
"ProvisionedThroughputExceededException": {},
"TransactionInProgressException": {},
"RequestLimitExceeded": {},
"BandwidthLimitExceeded": {},
"LimitExceededException": {},
"RequestThrottled": {},
"SlowDown": {},
"PriorRequestNotComplete": {},
"EC2ThrottledException": {},
}
It seems like EKS is throwing some sort of throttling error based on the description, but neither the exception name (InvalidRequestException
) nor the status code (400) indicate this. For a throttling error to be retried by the SDK, it has to have an error code of 429/5xx or a concrete exception that is one of those defined above.
Ideally, the EKS service should have returned a more accurate error, but changing returned data for services would be considered backwards incompatible and might break existing customers. In this case your only course of action would be to implement your own retryer and explicitly retry on "InvalidRequestException". You can refer to our docs for more information regarding info about the aws.Retryer.
Please let me know if you need anything else.
All the best, Ran~
hi Ran,
Thanks for your response.
We're seeing this error manifest itself as part of a Terraform apply operation. It's the official Hashicorp AWS provider that is raising the exception. The AWS Go SDK isn't being called directly from a codebase in which changes can be made i.e. a customised retry handler. I will link this response to the bug I raised on the Hashicorp AWS provider GitHub for context and see what response that receives.
Thanks again
I doubt we can safely customize to handle this in ECS, since we basically can't distinguish between actual invalid requests and throttling (we're not going to key off the message, I don't consider that value to be stable).
We should pursue getting the service team to correct this.
I would think that Terraform should be able to handle this in the interrim. I take this back in retrospect. It's as unreasonable for them to handle as it is for us, or anyone.
@lucix-aws - thanks for your response. Is there anything else you need from me regarding this? Can you confirm if a bug has been raised with the appropriate service team?
Hi @sfxandy,
I raised this internally with the EKS service team. I'll let you know when we hear something new.
Thanks, Ran~
P106921832
Any update on this issue?
Hello,
Is there any update on this issue. Last comment above was that it was being raised with the EKS service team.
Any update on this issue?
This should now be fixed. Can someone please give it a shot?
Thanks, Ran~
This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.
Describe the bug
As part of a CD pipeline which uses Terraform to update EKS managed node groups with new customised base AMI's the below error is often encountered:
The issue is also encountered when manually running the Terraform
apply
process to perform the EKS node updates from a local development environment/Expected Behavior
For the above exception, it is expected that a
RequestLimitExceeded
exception is raised which is considered a "retryable" error type and thus the AWS Go SDK is able to retry the API call subject to retry mode and maximum retry attempt parameters for the Terraform AWS provider.Current Behavior
The error encountered is:
This doesnt seem to be a valid error type to be retried according to aws/retry/retryable_error.go
Reproduction Steps
This error condition is frequently encountered when running a CD pipeline that invokes a Terraform
apply
operation to concurrently update a number of EKS managed node groups.This issue isn't directly reproducible via code implementation as the Terraform binary handles the relevant API calls.
Possible Solution
No response
Additional Information/Context
No response
AWS Go SDK V2 Module Versions Used
Compiler and Version used
Not applicable
Operating System and version
Linux x86_64, macOS Ventura 13.5.1