aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.47k stars 3.83k forks source link

aws_applicationautoscaling: Throw error when autoscaling policy breaches service limits during deployment #29082

Open ddao07 opened 6 months ago

ddao07 commented 6 months ago

Describe the feature

I'd like to see an error when an AWS account tries to deploy an autoscaling policy that violates service limits. For example, if I try to create an autoscaling policy with a max instance count of 40 when the service limit is 20, that should throw an error.

As a short term stop-gap, perhaps when code fails to autoscale a sagemaker endpoint to X number of instances because the account-level service limit is Y number of instances and X is > Y, then the endpoint should scale to Y instances at least.

Use Case

The AWS account had a service limit of 20 instances but the sagemaker endpoint tried to scale to 40 instances and I got this error:

“Failed to set desired instance count to 40. Reason: The account-level service limit 'ml.p3.2xlarge for endpoint usage' is 20 Instances, with current utilization of 6 Instances and a request delta of 34 Instances”.

This could have been prevented by an error saying that I cannot create an autsocaling policy that breaches service limits.

The impact of this issue could be mitigated if the endpoint at least scaled up to 20 instances. Instead, the endpoint stayed at 6 and kept trying to scale to 40.

Proposed Solution

No response

Other Information

No response

Acknowledgements

CDK version used

2.87

Environment details (OS name and version, etc.)

Amazon Linux 2 x86_64

pahud commented 6 months ago

The AWS account had a service limit of 20 instances but the sagemaker endpoint tried to scale to 40 instances and I got this error:

If this is a global hard limit for all accounts I agree we should throw the error in CDK synth time. But if this is per account limitation, CDK would not be able to know the limit. What do you think?

ddao07 commented 6 months ago

This was an account specific limit. Perhaps there should be a required input parameter to scaling policies for account service limits to prevent these kinds of errors? It would be a nice reminder, you know?

A separate thing though is that the auto scaling policy could have scaled from 6 to 20 instances when it ran into the limit when trying to scale from 6 to 40 instances as a way to gracefully handle the error.

On Tue, Feb 13, 2024, 1:30 PM Pahud Hsieh @.***> wrote:

The AWS account had a service limit of 20 instances but the sagemaker endpoint tried to scale to 40 instances and I got this error:

If this is a global hard limit for all accounts I agree we should throw the error in CDK synth time. But if this is per account limitation, CDK would not be able to know the limit. What do you think?

— Reply to this email directly, view it on GitHub https://github.com/aws/aws-cdk/issues/29082#issuecomment-1942156393, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALFOPDQ5CPOVA32F7GBDEGTYTOWNZAVCNFSM6AAAAABDE53XSCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBSGE2TMMZZGM . You are receiving this because you authored the thread.Message ID: @.***>