Closed andyfase closed 7 years ago
Thanks for the PR.
The change looks good to me. Can you confirm that you are OK this change to be under Apache 2.0 license?
Confirmed this change can be under the Apache 2.0 license.
The additional escaping in check_cli_version actually breaks for me. Running on Ubuntu 12.04. Version ends up never getting set.
Some debugging I did (added right before x, y, and z get set):
msg "Min CLI version is $MIN_CLI_VERSION"
msg "Second parameter is $2"
aws_cli_version=$($AWS_CLI --version)
msg "AWS CLI version check: $aws_cli_version"
msg "First parameter is $1"
msg "Version got set to $version"
Min CLI version is 1.3.25
Second parameter is
AWS CLI version check: aws-cli/1.11.80 Python/2.7.3 Linux/3.2.0-126-virtual botocore/1.5.43
First parameter is
Version got set to
get_instance_state_asg
also seems to be broken with the additional escaping added here.
[stderr]Checking if this instance has already been moved out of Standby state
[stderr]
[stderr]Bad value for --query "AutoScalingInstances[?InstanceId: Bad jmespath expression: Unclosed " delimiter:
[stderr]"AutoScalingInstances[?InstanceId
The current Load Balancing scripts do not work well at scale (deploying concurrently to over 50 instances). As all instances will simultaneously attempt to deregister / standby and then poll for status changes. This will cause API throttling which is currently not handled in the scripts processing.
This change adds a generic function called
exec_with_fulljitter_retry
which performs all CLI execution (AWS_CLI
variable ha been modified to call this function rather than call the CLI directly) this function has functionality toOptionally perform pre-jitter. This will slow initial execution (hence why its optional) but will likely speed up the processing of 00's of simultaneous instances
Perform CLI retry on the event of a failure using the "FullJitter" algorithm (exponential backoff with jitter) taken from AWS blog: https://www.awsarchitectureblog.com/2015/03/backoff.html
All variables related to the number of retries, base level of the exponential backoff etc can be modified at the top of this function.
Additional note variable
WAITER_INTERVAL_ASG
had also been increased to 3 seconds rather than 1 as typically setting an instance to standby takes ~15 seconds, retry'ing every second seems wasteful / non-valuable given the expected time for the change to complete.Testing that has been performed:
This change has also now been used in production for a week with code-deploy deploying simultaneously to 200+ instances.