buildkite / lifecycled

A daemon for responding to AWS AutoScaling Lifecycle Hooks
MIT License
146 stars 34 forks source link

Local health check to trigger ASG events #95

Open alexeiser opened 2 years ago

alexeiser commented 2 years ago

Given lifecycled's purpose of monitoring an instance for ASG based events - I thought it might be interesting if it also performed the reverse - and also monitored the local health of the application deployed on the instance.

For example - if a service fails - mark the ASG node as "unhealthy" and let ASG trigger an instance replacement. It could also possible be related to #79 and trigger the ASG Continue message marking a node as "in service".

My thought would be to allow the specification of a handler to perform the health check, and an example PID monitor or HTTP endpoint monitor. This is partially redundant to ELB/ALB monitoring - but might be faster for responding to events than relaying on the existing AWS health checks.

Would such a function make sense in this project?

NOTE: There is also the obvious question - why relay on an ASG node replacement, and not just perform a service restart, or what ever other action is appropriate based on the monitoring. In our case the app being monitored is complex enough that there is no obvious remediation, and a node remove and replace is the best way to bring it back in service.