Given lifecycled's purpose of monitoring an instance for ASG based events - I thought it might be interesting if it also performed the reverse - and also monitored the local health of the application deployed on the instance.
For example - if a service fails - mark the ASG node as "unhealthy" and let ASG trigger an instance replacement. It could also possible be related to #79 and trigger the ASG Continue message marking a node as "in service".
My thought would be to allow the specification of a handler to perform the health check, and an example PID monitor or HTTP endpoint monitor. This is partially redundant to ELB/ALB monitoring - but might be faster for responding to events than relaying on the existing AWS health checks.
Would such a function make sense in this project?
NOTE: There is also the obvious question - why relay on an ASG node replacement, and not just perform a service restart, or what ever other action is appropriate based on the monitoring. In our case the app being monitored is complex enough that there is no obvious remediation, and a node remove and replace is the best way to bring it back in service.
Given lifecycled's purpose of monitoring an instance for ASG based events - I thought it might be interesting if it also performed the reverse - and also monitored the local health of the application deployed on the instance.
For example - if a service fails - mark the ASG node as "unhealthy" and let ASG trigger an instance replacement. It could also possible be related to #79 and trigger the ASG Continue message marking a node as "in service".
My thought would be to allow the specification of a handler to perform the health check, and an example PID monitor or HTTP endpoint monitor. This is partially redundant to ELB/ALB monitoring - but might be faster for responding to events than relaying on the existing AWS health checks.
Would such a function make sense in this project?
NOTE: There is also the obvious question - why relay on an ASG node replacement, and not just perform a service restart, or what ever other action is appropriate based on the monitoring. In our case the app being monitored is complex enough that there is no obvious remediation, and a node remove and replace is the best way to bring it back in service.