department-of-veterans-affairs / va.gov-cms

Editor-centered management for Veteran-centered content.
https://prod.cms.va.gov
GNU General Public License v2.0
98 stars 69 forks source link

Make CMS AMIs restart Apache automatically. #15645

Open ndouglas opened 1 year ago

ndouglas commented 1 year ago

Description

On October 11th, I staggered into the front center office of Chez Nug Doug™ to discover that Prod and Staging had both been down for about an hour. The cause was failing health checks, which could have been due to just about anything but led to the ASG booting the working instance from the cluster.

When the replacement came back online, it didn't restart and thus did not incorporate the intended configuration, which led to it serving the test page. It needed manual intervention to resume behaving as expected.

Obviously, this shouldn't happen. If you're going to automate taking a server offline and bringing another server online, you should probably bring it online in working order.

Acceptance Criteria

ndouglas commented 1 year ago

In the meantime, to move an instance from Pending:Wait to InService, a DevOps engineer can:

$ aws-mfa
...
$ aws autoscaling complete-lifecycle-action \
          --region us-gov-west-1 \
          --auto-scaling-group-name "dsva-vagov-prod-cms-asg" \
          --lifecycle-hook-name launch-hook \
          --lifecycle-action-result CONTINUE \
          --instance-id <some-instance-id>
$ ssm-session vagov-prod cms auto
...
prod$ sudo service httpd restart
ndouglas commented 1 year ago

Might be fixed inline with the transition to AL2-hardened, see #15646

EWashb commented 7 months ago

@gracekretschmer-metrostar