aws-samples / aws-codedeploy-samples

Samples and template scenarios for AWS CodeDeploy
Apache License 2.0
639 stars 780 forks source link

Add backoff with Jitter to load-balancing scripts #65

Closed andyfase closed 7 years ago

andyfase commented 7 years ago

The current Load Balancing scripts do not work well at scale (deploying concurrently to over 50 instances). As all instances will simultaneously attempt to deregister / standby and then poll for status changes. This will cause API throttling which is currently not handled in the scripts processing.

This change adds a generic function called exec_with_fulljitter_retry which performs all CLI execution (AWS_CLI variable ha been modified to call this function rather than call the CLI directly) this function has functionality to

  1. Optionally perform pre-jitter. This will slow initial execution (hence why its optional) but will likely speed up the processing of 00's of simultaneous instances

  2. Perform CLI retry on the event of a failure using the "FullJitter" algorithm (exponential backoff with jitter) taken from AWS blog: https://www.awsarchitectureblog.com/2015/03/backoff.html

All variables related to the number of retries, base level of the exponential backoff etc can be modified at the top of this function.

Additional note variable WAITER_INTERVAL_ASG had also been increased to 3 seconds rather than 1 as typically setting an instance to standby takes ~15 seconds, retry'ing every second seems wasteful / non-valuable given the expected time for the change to complete.

Testing that has been performed:

  1. Putting instance in standby / active within a ASG on an ELB
  2. Deregistering / Registration of an instance on an ELB (without ASG method of script)
  3. Points 1 & 2 for an ALB using elb-v2 versions of scripts
  4. Testing to some extent (on elb version of script) the “HANDLE_PROCS” additional functions

This change has also now been used in production for a week with code-deploy deploying simultaneously to 200+ instances.

yubangxi commented 7 years ago

Thanks for the PR.

The change looks good to me. Can you confirm that you are OK this change to be under Apache 2.0 license?

andyfase commented 7 years ago

Confirmed this change can be under the Apache 2.0 license.

nwalke commented 7 years ago

The additional escaping in check_cli_version actually breaks for me. Running on Ubuntu 12.04. Version ends up never getting set.

Some debugging I did (added right before x, y, and z get set):

    msg "Min CLI version is $MIN_CLI_VERSION"
    msg "Second parameter is $2"
    aws_cli_version=$($AWS_CLI --version)
    msg "AWS CLI version check: $aws_cli_version"
    msg "First parameter is $1"
    msg "Version got set to $version"
Min CLI version is 1.3.25
Second parameter is
AWS CLI version check: aws-cli/1.11.80 Python/2.7.3 Linux/3.2.0-126-virtual botocore/1.5.43
First parameter is
Version got set to
nwalke commented 7 years ago

get_instance_state_asg also seems to be broken with the additional escaping added here.

[stderr]Checking if this instance has already been moved out of Standby state
[stderr]
[stderr]Bad value for --query "AutoScalingInstances[?InstanceId: Bad jmespath expression: Unclosed " delimiter:
[stderr]"AutoScalingInstances[?InstanceId