airbnb / nerve

A service registration daemon that performs health checks; companion to airbnb/synapse
MIT License
942 stars 151 forks source link

Adding possibility to add admin_port value, and also report it to ZK. #31

Closed olvesh closed 10 years ago

olvesh commented 10 years ago

We are evaluating SmartStack (Nerve/Synapse/HAProxy) in our products, and since we are using Dropwizard/Metrics extensively we need to be able to do the checkon another port.

Now this is already possible by override the port in the checks, but I also need to make the admin port available in Zookeeper as we are building a service to keep track of all running services and their state. To do that we need to get hold of the admin port in addition to the regular traffic port.

Not a daily rubyist, so hopefully my coding is up to your standards. If it is not, I will not be offended :-)

Hopefully this scratches someone elses back as well.

igor47 commented 10 years ago

thanks for the PR, @olvesh! let me explain why i'm going to close it.

for our uses of dropwizard inside airbnb, we configured dropwizard to run it's health checks on the same port as the main application. this is implemented as a Resource, so it runs in the application thread pool, but it still uses the same HealthCheckRegistry as the one dropwizard automatically uses on the admin port.

why? by running health checks in the same thread pool as your normal responses, you reduce the likelyhood of the scenario where

when your service says "oh yes, I'm fine", it may well be the case that the only thing still functioning in the server is the little component that knows how to say "I'm fine, roger roger, over and out" in a cheery droid voice"

(from the yegge rant). a nerve health check is supposed to succeed only where a real request would succeed, and fail when a real request would fail. so, i highly recommend that health checks share as much functionality with real requests as possible and NOT run on a dedicated port with a dedicated thread.

if you would like the dropwizard admin port available in zookeeper for other reasons (e.g. other services make automated requests to that port), i recommend defining an additional service just for the admin port and calling it something like myservice-admin.

olvesh commented 10 years ago

I understand what you are saying, but I still partly disagree.

Say I have a service-pool of 5 services loadbalanced with AirBNB-nerve/synapse. I do as you suggest and put the health checks in the main thread pool. If the service-pool (services A-E) is overloaded, and a connection to service A is not getting through because it is "full", it may be still serving requests. If I decide to pull it from the HAProxy based on a timeout happening while the service is still serving requests would be putting even more load on the other ones. The rest will overload and a "rolling blackout" is happening.

At least this was the rationale of my thinking. There might be other ways around this problem though.