ASG stuck in a loop when using ELB health check

baktak commented 6 years ago

I'm not sure if the issue I'm seeing is related to either issue #25 or #7 so I'll add the steps to duplicate. I have a fresh install of the Wordpress application (used for AWS training purposes only). I've used t2.micro instances for the WP EC2 instances, and have db.t2.small instances for RDS. I also loaded it with 1 GB of sample data.

I have not put any load against the application.

What I wanted to test and learn more about is the use of Auto Scaling Health Checks, and specifically the difference when using the EC2 vs. ELB health check option. My understanding is that with the ELB option, if a health check fails then the instance will be terminated and new instance created. The WP architecture is configured to use the ELB health check option. The target group for the WP EC2 instances uses a health check that looks for /wp_login.php.

To simulate the failure, I logged in to one of my WP EC2 instances and changed the name of the health check file from wp_login.php to wp_login0.php, and then logged out. I was expecting that instance to be terminated and a new instance created.

I got busy with another task, and when I came back to the EC2 console about an hour later I found during that timeframe there were 13 successful launches and 13 successful terminations listed in the Activity History of the ASG, and more were in process.

I have not yet determined which setting I need to change, whether the health check grace period, the cool down setting, or if there's some other item such as the choice of instance size that was resulting in this loop. However, this was done using the default settings in the CloudFormation script and perhaps something needs to be adjusted.

baktak commented 6 years ago

In case it helps, here is a screen shot for a subset of the ASG Activity History for the WP ASG.

baktak commented 6 years ago

(doh) Experimenting a bit more and it just dawned on my why I'm in this loop. The application files, including the wp_login.php file that is used as a health check for the load balancer target group are on the EFS storage. When a new instance is launched to replace the terminated instance, it mounts the same EFS storage which of course is still missing the file used for the ELB health check. Hence the infinite loop of terminate/launch. Simple but good learning experience.

It does bring up an idea for a possible enhancement. It might be nice if the WP application itself was configured to run on the EBS storage, and only mount the ../upload folders to the EFS storage. However, my 'issue' is a user-error and I'll close it.

sxmxc24 commented 5 years ago

So @baktak what file due you recommend using for the healthcheck path?

aws-samples / aws-refarch-wordpress

ASG stuck in a loop when using ELB health check #39