aws-solutions / instance-scheduler-on-aws

A cross-account and cross-region solution that allows customers to automatically start and stop EC2 and RDS Instances
https://aws.amazon.com/solutions/implementations/instance-scheduler-on-aws/
Apache License 2.0
547 stars 265 forks source link

ThrottlingException when Adding more than 1 EC2 instance to schedule #284

Closed Truffles56 closed 1 year ago

Truffles56 commented 2 years ago

Describe the bug I've followed the CloudFormation deployment instructions and when targeting 1 instance everything works as it should. When I try to scale up I start to get the following error:

Failure messageRate exceeded (Service: AWSSimpleSystemsManagement; Status Code: 400; Error Code: ThrottlingException; Request ID: ; Proxy: null)

I've put all the instances on their own schedule because:

  1. If I create 1 schedule for many EC2/RDS instances and 1 fails/times out then the others don't get triggered
  2. If I create individual schedules for the EC2/RDS instances I get rate throttling issues mentioned above

To Reproduce

  1. Deploy the Instance Scheduler v2.0 per the instructions
  2. Add more than 1 EC2 instance to the scheduling routine

Expected behavior I'd like to be able to bring the servers up and down at the same time without a failure to 1 causing the rest of the scrip to time out.

Please complete the following information about the solution:

To get the version of the solution, you can look at the description of the created CloudFormation stack. For example, "(SO0021) - Video On Demand workflow with AWS Step Functions, MediaConvert, MediaPackage, S3, CloudFront and DynamoDB. Version v5.0.0". If the description does not contain the version information, you can look at the mappings section of the template:

Mappings:
  SourceCode:
    General:
      S3Bucket: "solutions"
      KeyPrefix: "video-on-demand-on-aws/v5.0.0"

Screenshots If applicable, add screenshots to help explain your problem (please DO NOT include sensitive information).

Additional context Add any other context about the problem here.

gockle commented 2 years ago

Hi @Truffles56

You can add all instances to the same schedule, and depending on how many regions and accounts are configured for the solution, The Systems Manager Automation will create multiple automations for each account and region, The solution has an environment variable SSM_MAX_CONCURRENCY in the lambda function, this will be passed to the SSM start_automation_execution api. Based on usage i.e (number of schedules, accounts, regions) you can reduce this value (default solution has 100%), the SSM_MAX_CONCURRENCY value to 10%,15% ....etc. This should resolve the Throttling issue in the executeAutomation API. The number of parallel automation that can be started in the account is dependent on the usage of Automation and the following service quota Concurrently executing rate control automation

Truffles56 commented 2 years ago

Thanks for the reply! I've made the service quota request to increase the rate control from 25 to 100. Do I need to complete both actions to get rid of the throttling issue? I've updated the SSM_MAX_Concurrency to 10% and I'm still getting the Throttling exception while I wait for the Service Quota increase.

gockle commented 2 years ago

@Truffles56 You can reduce the throttling to the minimum value of 5% and then once the service quota is increased update it to a higher value.

Truffles56 commented 2 years ago

Update for anyone else. I've reduced to the minimum value of 5% and then had to get a service quota increased. Here is what support from AWS said: We've increased the StartAutomationExecution TPS limit for this account from (1,3) to (3,9). Please let us know if you're still receiving the same error message.

This seems to have settled it. Thanks again for your promt responses!

gockle commented 2 years ago

Hi @Truffles56 The solution version v2.0.0 has been rolled back #289

hearde commented 1 year ago

Closed since this is an issue with the retracted v2.0.0 release of the solution.