airbnb / nerve

A service registration daemon that performs health checks; companion to airbnb/synapse
MIT License
942 stars 151 forks source link

Support Netflix Exhibitor #74

Closed lsjostro closed 7 years ago

lsjostro commented 9 years ago

This PR will add support for Netflix Exhibitor. It's more or less a copy of the regular zookeeper watcher adding but adds polling of the Exhibitor REST endpoint for zk ensemble changes. If the ensemble changes this watcher will detect that and stop the watcher.

This is great when maintaining zookeeper in the cloud to not have static zookeeper endpoints.

igor47 commented 9 years ago

i haven't finished reviewing this, and i would definitely like to see some tests before merging.

however, a design question. how do you discover the exhibitor endpoint? it seems to me that this reporter is just shifting the discovery problem from the zk hosts to the exhibitor endpoint. currently, if a zk host goes down and you rotate it out, you'll have to swap the new zk host into the old zk host's DNS entry. using this reporter, you won't have to worry about zk hosts, but if an exhibitor host goes down you'll have to swap the replacement into the old hosts's DNS entry.

in fact, because of the way you wrote this, such that exhibitor being unavailable un-reports the services using this reporter, this seems more fragile then the original design. when i pass in 5 zk hosts to the ZK reporter, two of them can go down and i'll continue reporting. with this exhibitor reporter, a single host going down will cause problems.

i don't see how this buys us anything. am i missing something? could you explain?

lsjostro commented 9 years ago

Thanks for the quick response @igor47!

So way use this instead of static zk endpoints. What netflix found and why they created Exhibitor/Curator is that it's a hassle to maintain zookeeper especially in cloud environment where you don't want to have too many "pets". Exhibitor has a rich API to control zookeeper. Exhibitor make sure that your cluster is always in "quorum" when rolling instances. please have a look here: https://github.com/Netflix/exhibitor/wiki and https://github.com/Netflix/curator/wiki/Exhibitor-Integration

How to discover exhibitor endpoints? First of all exhibitor is running on all zk-hosts, so you may say exhibitor == zookeeper node. Example design would be in AWS where you have an ELB infront of your exhibitors (using a autoscaling group) and use that ELB dns name as a static endpoint. The reporter will pull the exhibitors cluster list API (ELB) for ensamble changes. just like: https://github.com/apache/curator/blob/master/curator-client/src/main/java/org/apache/curator/ensemble/exhibitor/ExhibitorEnsembleProvider.java

Whenever the ensamble is changed the reporter will reconnect using the new connection string. you are right regarding the reporter will unregister the service temporarily. Don't see that as a problem though.. just important that you are aware of it.

I actually have some tests prepared. Will add those tomorrow and once again thanks for the feedback!

lsjostro commented 9 years ago

@igor47 ping