aws-solutions / instance-scheduler-on-aws

A cross-account and cross-region solution that allows customers to automatically start and stop EC2 and RDS Instances
https://aws.amazon.com/solutions/implementations/instance-scheduler-on-aws/
Apache License 2.0
556 stars 279 forks source link

Instance Scheduler should allow for instance type flexibility #501

Open playphil opened 11 months ago

playphil commented 11 months ago

Is your feature request related to a problem? Please describe. Sometimes for a single specified instance type there is not enough capacity in the AZ. This results in an Insufficient Capacity Error (ICE). This occurs more frequently for peculiar types. May also occur during periods of high demand. If there is impact in another AZ or other region, there becomes a thundering herd of many requests to launch instances so demand may exceed available capacity. To avoid ICE's it's advisable for automation to be flexible as to which AZ and also flexible as to what type is deemed acceptable for launch.

Launch Templates can specify multiple possible instance types https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/create-launch-template.html#lt-instance-type

And Fleets in ec2 help flexibility by leveraging things like "attribute based instance type selection". https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-fleet-attribute-based-instance-type-selection.html

Describe the feature you'd like Instance Scheduler solution should allow for optional use of ec2 Fleets and Launch Templates, as these are standard mechanisms that allow instance type flexibility and AZ flexibility. This will help keep everyone get back to work by launching their next preferred instance types on days when their first choice is not available.

Additional context

CrypticCabub commented 11 months ago

Hi @playphil

Thanks for submitting this FR! The correct handling of ICE errors is an ongoing discussion, and you present some interesting ideas that I will bring to the rest of the team for consideration.

playphil commented 11 months ago

Great ya, another way might also to be allow each to configure a small list of additional instance types, sizes or qualities to try if/when an ICE is received at time of attempting a re-launch of an existing instance that may not already be a part of a fleet nor launch template.

ashraf133 commented 7 months ago

Hello i have the same issue, you can maybe add a new tag that contain ec2 types catalog if the first encounter an error then it takes the second

playphil commented 7 months ago

Hello i have the same issue, you can maybe add a new tag that contain ec2 types catalog if the first encounter an error then it takes the second

Awesome idea, gives a simple way everyone could adopt incrementally. The optional additional tag values on some instances could then be attempted for launch anytime ICE is found in Cloudtrail. Example: preferred-instance-types = m7i.xlarge,m6i.xlarge,m5a.xlarge,c7i.xlarge,r7i.xlarge

ashraf133 commented 3 months ago

Hello, Any update?

shujacks commented 2 months ago

Hi @playphil and @ashraf133, thanks for reaching out, and to help us prioritize items in our backlog, can you please let us know which company you are representing, and what your specific use cases are?

playphil commented 2 months ago

@shujacks the use case is explained in the issue. Please review the full contents here with your tech lead that understands the nature of ec2 physical capacity constraints and ICE. This is important for all large customers having thousands of instances or more. Instance type flexibility becomes even more important during world events, natural disasters, and with instance types that are in short supply in a given AZ. Instance type flexibility is a core principle of scalable, reliable use of ec2 instances. The companies we currently represent is of no relevance to this issue.

ashraf133 commented 2 months ago

@shujacks , we encounter everyday ec2 capacity problem when starting instances An error occurred (InsufficientInstanceCapacity) when calling the StartInstances operation

The solution that we suggest is adding a new tag that contains a list of ec2 instances types and try to start using the first type, if it is ok then everything is good, else try with the second type ... as mentionned by @playphil preferred-instance-types = m7i.xlarge,m6i.xlarge,m5a.xlarge,c7i.xlarge,r7i.xlarge

shujacks commented 2 months ago

Hi @playphil @ashraf133 thanks for the response. We certainly understand the ec2 capacity issue. However, the product team requested to get this information to prioritize this feature request since we have a long backlog to evaluate: 1. size of the deployment 2. how long have you been using the solution 3. use case (e.g. for dev account? for testing purpose?).

The answers to these questions will help us prioritize all customer asks, thank you.

ashraf133 commented 2 months ago

i use it for around 2000 ec2 and 300 rds since 2022 for all environments

shujacks commented 1 week ago

Thanks for the response, we have added this to our backlog and will review it for a future release.