Open ndouglas opened 1 year ago
In the meantime, to move an instance from Pending:Wait to InService, a DevOps engineer can:
$ aws-mfa
...
$ aws autoscaling complete-lifecycle-action \
--region us-gov-west-1 \
--auto-scaling-group-name "dsva-vagov-prod-cms-asg" \
--lifecycle-hook-name launch-hook \
--lifecycle-action-result CONTINUE \
--instance-id <some-instance-id>
$ ssm-session vagov-prod cms auto
...
prod$ sudo service httpd restart
Might be fixed inline with the transition to AL2-hardened, see #15646
@gracekretschmer-metrostar
Description
On October 11th, I staggered into the front center office of Chez Nug Doug™ to discover that Prod and Staging had both been down for about an hour. The cause was failing health checks, which could have been due to just about anything but led to the ASG booting the working instance from the cluster.
When the replacement came back online, it didn't restart and thus did not incorporate the intended configuration, which led to it serving the test page. It needed manual intervention to resume behaving as expected.
Obviously, this shouldn't happen. If you're going to automate taking a server offline and bringing another server online, you should probably bring it online in working order.
Acceptance Criteria