acquia / moonshot

Moonshot: Because releasing services shouldn't be a moonshot!
Apache License 2.0
52 stars 49 forks source link

CPD-6858 : Update asg plugin to check if instance exists #283

Closed tanujjain49 closed 2 years ago

kaushik commented 2 years ago

I want to see a manual review or a test covering these changes. Preferably reproducing the terminated instance issue.

Hint: we can just call moonshot update after making a change to worker template(eg: update ami) and that will trigger rotation of instances after stack update.

tanujjain49 commented 2 years ago

MANUAL REVIEW

  1. Updating the moonshot gem.
2022-01-06 12:59:45 [cloudservicesdev|cloud-data:tanujjain] ~/Documents/github/acquia/cloud-database-worker$ git diff
diff --git a/Gemfile b/Gemfile
index 399baee..68fc232 100644
--- a/Gemfile
+++ b/Gemfile
@@ -52,8 +52,8 @@ group :development do
   # @see https://github.com/acquia/moonshot/pull/245
   # @see https://backlog.acquia.com/browse/CPD-3865
   gem 'moonshot',
-      git: 'git@github.com:acquia/moonshot.git',
-      ref: 'proxy-cli-hook-aws-v2'
+      git: 'git@github.com:tanujjain49/moonshot.git',
+      ref: 'CPD-6858'

   gem 'moonshot-production-safety',
       git: 'git@github.com:acquia/cloud-moonshot-production-safety.git',
diff --git a/Gemfile.lock b/Gemfile.lock
index d178d59..b584d34 100644
--- a/Gemfile.lock
+++ b/Gemfile.lock
@@ -99,9 +99,23 @@ GIT
       fpm (> 1.4)

 GIT
-  remote: git@github.com:acquia/moonshot.git
-  revision: e7e923ca13440303cad6951944ead763bf2c54a7
-  ref: proxy-cli-hook-aws-v2
+  remote: git@github.com:acquia/signalfx-bugsnag-middleware.git
+  revision: a5f1c13593888caf7002bfb011b3213057031ffb
+  ref: master
+  specs:
+    signalfx-bugsnag-middleware (0.1.0)
+
+GIT
+  remote: git@github.com:acquia/systemd-daemon.git
+  revision: 44623468b56dcd945242e313bef1cd6aeb48068b
+  ref: 0.0.1
+  specs:
+    systemd-daemon (0.1.0)
+
+GIT
+  remote: git@github.com:tanujjain49/moonshot.git
+  revision: bfd7dba55044f0d51ea0986539da4a1cd2da99f6
+  ref: CPD-6858
   specs:
     moonshot (2.0.0.beta6)
       activesupport
@@ -119,20 +133,6 @@ GIT
       travis
       vandamme

-GIT
-  remote: git@github.com:acquia/signalfx-bugsnag-middleware.git
-  revision: a5f1c13593888caf7002bfb011b3213057031ffb
-  ref: master
-  specs:
-    signalfx-bugsnag-middleware (0.1.0)
-
-GIT
-  remote: git@github.com:acquia/systemd-daemon.git
2022-01-06 13:00:04 [cloudservicesdev|cloud-data:tanujj
  1. Updating the dev-tanuj stage.
    2022-01-06 13:02:09 [cloudservicesdev|cloud-data:tanujjain] ~/Documents/github/acquia/cloud-database-worker$ bundle exec moonshot update -n dev-tanuj
    Using the Central Model for authentication
    Version 2 of the Ruby SDK will enter maintenance mode as of November 20, 2020. To continue receiving service updates and new features, please upgrade to Version 3. More information can be found here: https://aws.amazon.com/blogs/developer/deprecation-schedule-for-aws-sdk-for-ruby-v2/
    [ ✓ ] [ 0m 0s ] Using existing KMS Key for ParameterKMS!                                                                                                                                              
    [ ✓ ] [ 0m 1s ] Using previous encrypted value for NewRelicLicenseKey.                                                                                                                                
    [ ✓ ] [ 0m 8s ] ChangeSet moonshot-cdb-worker-dev-tanuj-1641454378 ready!                                                                                                                             
    * Modify LaunchConfig (AWS::AutoScaling::LaunchConfiguration)
    - Will be replaced
    - Caused by template change (Properties: ImageId)
    * Modify WorkerASG (AWS::AutoScaling::AutoScalingGroup)
    - May be replaced (Conditional)
    - Caused by LaunchConfig (ResourceReference)
    Apply changes? yes
    [ ✓ ] [ 0m 2s ] Executed ChangeSet moonshot-cdb-worker-dev-tanuj-1641454378 for CloudFormation Stack cdb-worker-dev-tanuj.                                                                            
    [ ✓ ] [ 0m 30s ] CloudFormation Stack cdb-worker-dev-tanuj successfully updated.                                                                                                                      
    [ ✓ ] [ 0m 3s ] CodeDeploy Application cdb-worker-dev-tanuj already exists.                                                                                                                           
    [ ✓ ] [ 0m 0s ] CodeDeploy CodeDeploy Deployment Group cdb-worker-dev-tanuj already exists.                                                                                                           
    [ ✓ ] [ 0m 6s ] AutoScaling Group(s) up to capacity!                                                                                                                                                  
    [ ✓ ] [ 0m 2s ] Uploading 'cdb-worker_1641454547_tanuj.jain.tar.gz' to 'cdb-stack-resource-backups-dev' succeeded.                                                                                    
    --> ASG desired capacity updated to 2.                                                                                                                                                                
    --> All instances cycled.                                                                                                                                                                             
    --> ASG desired capacity updated to 1.                                                                                                                                                                
    [ ✓ ] [ 9m 3s ] ASG instances rotated successfully!                                                                                                                                                   
    2022-01-06 13:14:55 [cloudservicesdev|cloud-data:tanujjain] ~/Documents/github/acquia/cloud-database-worker$
tanujjain49 commented 2 years ago

MANUAL REVIEW

  1. With old moonshot gem, it errors out, if we delete an instance before checking it's lifecycle_state.
    2022-01-11 15:17:11 [cloudservicesdev|cloud-data:tanujjain] ~/Documents/github/acquia/cloud-database-worker$ bundle exec moonshot update -n dev-tanuj
    Using the Central Model for authentication
    Version 2 of the Ruby SDK will enter maintenance mode as of November 20, 2020. To continue receiving service updates and new features, please upgrade to Version 3. More information can be found here: https://aws.amazon.com/blogs/developer/deprecation-schedule-for-aws-sdk-for-ruby-v2/
    [ ✓ ] [ 0m 1s ] Using existing KMS Key for ParameterKMS!                                                                                                                                              
    [ ✓ ] [ 0m 0s ] Using previous encrypted value for NewRelicLicenseKey.                                                                                                                                
    [ ✓ ] [ 0m 15s ] ChangeSet moonshot-cdb-worker-dev-tanuj-1641894465 ready!                                                                                                                            
    * Modify LaunchConfig (AWS::AutoScaling::LaunchConfiguration)
    - Will be replaced
    - Caused by template change (Properties: ImageId)
    * Modify WorkerASG (AWS::AutoScaling::AutoScalingGroup)
    - May be replaced (Conditional)
    - Caused by LaunchConfig (ResourceReference)
    Apply changes? y
    Please enter 'yes' or 'no'!
    Apply changes? yes
    [ ✓ ] [ 0m 3s ] Executed ChangeSet moonshot-cdb-worker-dev-tanuj-1641894465 for CloudFormation Stack cdb-worker-dev-tanuj.                                                                            
    [ ✓ ] [ 0m 28s ] CloudFormation Stack cdb-worker-dev-tanuj successfully updated.                                                                                                                      
    [ ✓ ] [ 0m 2s ] CodeDeploy Application cdb-worker-dev-tanuj already exists.                                                                                                                           
    [ ✓ ] [ 0m 0s ] CodeDeploy CodeDeploy Deployment Group cdb-worker-dev-tanuj already exists.                                                                                                           
    [ ✓ ] [ 0m 5s ] AutoScaling Group(s) up to capacity!                                                                                                                                                  
    [ ✓ ] [ 0m 2s ] Uploading 'cdb-worker_1641894539_tanuj.jain.tar.gz' to 'cdb-stack-resource-backups-dev' succeeded.                                                                                    
    --> ASG desired capacity updated to 2.                                                                                                                                                                
    [ ✓ ] [ 3m 54s ] AutoScaling Group up to capacity!                                                                                                                                                    
    [295, 304] in /Users/tanuj.jain/Documents/github/acquia/cloud-database-worker/vendor/bundle/ruby/2.6.0/bundler/gems/moonshot-e7e923ca1344/lib/plugins/rotate_asg_instances/asg.rb
    295:         instance.wait_until_stopped
    296:       end
    297: 
    298:       def instance_in_terminal_state?(instance)
    299:         require 'byebug'; byebug
    => 300:         state = if instance.is_a?(Aws::EC2::Instance)
    301:                   instance.state.name
    302:                 elsif instance.is_a?(Aws::AutoScaling::Instance)
    303:                   instance.load.lifecycle_state
    304:                 end
    (byebug) instance.id
    "i-022f79d23bbb2119a"
    (byebug) instance.is_a?(Aws::EC2::Instance)
    false
    (byebug) continue
    --> ASG desired capacity updated to 1.                                                                                                                                                                
    [ / ] [ 9m 33s ] Waiting for Autoscaling group to reach desired capacity...#<Thread:0x00007fd424028bb8@/Users/tanuj.jain/Documents/github/acquia/cloud-database-worker/vendor/bundle/ruby/2.6.0/gems/interactive-logger-0.1.3/lib/interactive-logger.rb:39 run> terminated with exception (report_on_exception is true):
    Traceback (most recent call last):
    8: from /Users/tanuj.jain/Documents/github/acquia/cloud-database-worker/vendor/bundle/ruby/2.6.0/gems/interactive-logger-0.1.3/lib/interactive-logger.rb:40:in `block in start_threaded'
    7: from /Users/tanuj.jain/Documents/github/acquia/cloud-database-worker/vendor/bundle/ruby/2.6.0/bundler/gems/moonshot-e7e923ca1344/lib/plugins/rotate_asg_instances/asg.rb:42:in `block in rotate_asg_instances'
    6: from /Users/tanuj.jain/Documents/github/acquia/cloud-database-worker/vendor/bundle/ruby/2.6.0/bundler/gems/moonshot-e7e923ca1344/lib/plugins/rotate_asg_instances/asg.rb:69:in `with_scale_up'
    5: from /Users/tanuj.jain/Documents/github/acquia/cloud-database-worker/vendor/bundle/ruby/2.6.0/bundler/gems/moonshot-e7e923ca1344/lib/plugins/rotate_asg_instances/asg.rb:43:in `block (2 levels) in rotate_asg_instances'
    4: from /Users/tanuj.jain/Documents/github/acquia/cloud-database-worker/vendor/bundle/ruby/2.6.0/bundler/gems/moonshot-e7e923ca1344/lib/plugins/rotate_asg_instances/asg.rb:103:in `outdated_volumes'
    3: from /Users/tanuj.jain/Documents/github/acquia/cloud-database-worker/vendor/bundle/ruby/2.6.0/bundler/gems/moonshot-e7e923ca1344/lib/plugins/rotate_asg_instances/asg.rb:103:in `each'
    2: from /Users/tanuj.jain/Documents/github/acquia/cloud-database-worker/vendor/bundle/ruby/2.6.0/bundler/gems/moonshot-e7e923ca1344/lib/plugins/rotate_asg_instances/asg.rb:105:in `block in outdated_volumes'
    1: from /Users/tanuj.jain/Documents/github/acquia/cloud-database-worker/vendor/bundle/ruby/2.6.0/bundler/gems/moonshot-e7e923ca1344/lib/plugins/rotate_asg_instances/asg.rb:303:in `instance_in_terminal_state?'
    /Users/tanuj.jain/Documents/github/acquia/cloud-database-worker/vendor/bundle/ruby/2.6.0/bundler/gems/aws-sdk-ruby-01ea5bf78a70/aws-sdk-resources/lib/aws-sdk-resources/resource.rb:223:in `block in add_data_attribute': undefined method `[]' for nil:NilClass (NoMethodError)
    [ ✗ ] [ 9m 33s ] Error while performing step: Rotating ASG instances...                                                                                                                               
    NoMethodError: undefined method `[]' for nil:NilClass
    undefined method `[]' for nil:NilClass (at /Users/tanuj.jain/Documents/github/acquia/cloud-database-worker/vendor/bundle/ruby/2.6.0/bundler/gems/aws-sdk-ruby-01ea5bf78a70/aws-sdk-resources/lib/aws-sdk-resources/resource.rb:223:in `block in add_data_attribute')
    2022-01-11 15:28:37 [cloudservicesdev|cloud-data:tanujjain] ~/Documents/github/acquia/cloud-database-worker$

    -> Instance state in aws console is terminated before we continued in byebug session.

Screen Shot 2022-01-11 at 3 42 02 PM

  1. Updated the moonshot gem to use the fixed changes and ran the moonshot update on another stage. ( dev-pritam ).

    
    2022-01-11 15:30:41 [cloudservicesdev|cloud-data:tanujjain] ~/Documents/github/acquia/cloud-database-worker$ git diff 
    diff --git a/Gemfile b/Gemfile
    index 399baee..68fc232 100644
    --- a/Gemfile
    +++ b/Gemfile
    @@ -52,8 +52,8 @@ group :development do
    # @see https://github.com/acquia/moonshot/pull/245
    # @see https://backlog.acquia.com/browse/CPD-3865
    gem 'moonshot',
    -      git: 'git@github.com:acquia/moonshot.git',
    -      ref: 'proxy-cli-hook-aws-v2'
    +      git: 'git@github.com:tanujjain49/moonshot.git',
    +      ref: 'CPD-6858'
    
    gem 'moonshot-production-safety',
       git: 'git@github.com:acquia/cloud-moonshot-production-safety.git',
    diff --git a/Gemfile.lock b/Gemfile.lock
    index b01c5dd..f292860 100644
    --- a/Gemfile.lock
    +++ b/Gemfile.lock
    @@ -99,9 +99,23 @@ GIT
       fpm (> 1.4)
    
    GIT
    -  remote: git@github.com:acquia/moonshot.git
    -  revision: e7e923ca13440303cad6951944ead763bf2c54a7
    -  ref: proxy-cli-hook-aws-v2
    +  remote: git@github.com:acquia/signalfx-bugsnag-middleware.git
    +  revision: a5f1c13593888caf7002bfb011b3213057031ffb
    +  ref: master
    +  specs:
    +    signalfx-bugsnag-middleware (0.1.0)
    +
    +GIT
    +  remote: git@github.com:acquia/systemd-daemon.git
    +  revision: 44623468b56dcd945242e313bef1cd6aeb48068b
    +  ref: 0.0.1
    +  specs:
    +    systemd-daemon (0.1.0)
    +
    +GIT
    +  remote: git@github.com:tanujjain49/moonshot.git
    +  revision: bfd7dba55044f0d51ea0986539da4a1cd2da99f6
    +  ref: CPD-6858
    specs:
     moonshot (2.0.0.beta6)
       activesupport
    @@ -119,20 +133,6 @@ GIT
       travis
       vandamme

-GIT

-> instance state was terminated before we continued in the byebug session. image

pdrakeweb commented 2 years ago

Manual test looks good, just needs some unit tests I think.

tanujjain49 commented 2 years ago

Updated the unit tests and have refactored the code for instance_terminal_state?.