bobtfish / AWSnycast

'Anycast' in AWS
Apache License 2.0
230 stars 25 forks source link

Healthcheck using existing monitoring systems #9

Open sarguru opened 8 years ago

sarguru commented 8 years ago

This might be one of the wildcard ideas which we can drop later (or might be rendered useless by AWS NAT gateway service) but anyway dropping it here.

The NAT (0.0.0.0) route management mostly dependent upon AWS VPC route tables info on black holed routes, to figure out failed instance and replace them, which is probably the best option now at the time of network partitions. But it is found that sometimes it takes a while for AWS API to report a route as black holed and its dependent on instance states, so if there is any other issue that causes NAT to fail in the particular region , the route management is not clean at the moment.

One of the probable solution(??) for this problem is adding an ability support external health checks. FWI I find monitoring systems such as sensu provide better view of the health of the instance even at the time of network partitioning (provided its set up in a partition tolerant fashion). We can possibly use this in our case.

The workflow would roughly be,

bobtfish commented 8 years ago

I like this idea :)

I've just finished adding command healthchecks, which can be used for (some of) this - i.e. you could write a script to check health state from Sensu, and plug that in as a command healthcheck.

I guess that we'd want to be able to have multiple healthchecks for a single route to fully support this though.

Definitely worth chatting about / thinking about further though :)