airbnb / nerve

A service registration daemon that performs health checks; companion to airbnb/synapse
MIT License
942 stars 151 forks source link

cross dc services and failover #86

Closed yagnik closed 8 years ago

yagnik commented 8 years ago

Wanted to hear your thoughts on this, we have two dc running and in ideal world they both run as individual pods which would mean that nerve/synapse are local to a dc which works out well. Now on top of this we are thinking if we can use smartstack for cross dc discovery if you don't find any available services in your dc.

Did you guys think about this, do you face this issue at yelp or airbnb ? @jolynch @igor47 ?

jolynch commented 8 years ago

@yagnik Yea we do this at Yelp with cross registration (nerve registers the service in multiple places). Airbnb went the filtering route. See #81 for context.

My reply on #81 starting with "I apologize ahead of time for the book of a reply here" has a breakdown of how you can do cross-dc failover using HAProxy ACLs and cross registration. If you're only running one ZK cluster you can use the filters that were added as part of https://github.com/airbnb/synapse/pull/164

yagnik commented 8 years ago

I'm unsure how I missed labels!! We are doing exactly what you suggested just needed the glue to patch it all together. This is excellent. Thanks!

yagnik commented 8 years ago

btw @jolynch do you guys use multiple zk clusters one per az ? Network flapping has caused me much grieve when running cross dc cluster.

jolynch commented 8 years ago

@yagnik we run ZKs across AZs and have not had too many issues, but we've also tuned the bejeezers out of our ZK instances and are careful with Nerve restarts (which cause a lot of zk churn). I recall us having reliability improvements by moving to a big enough instance to have SSDs and the one step up networking (mostly m3.larges, with the occasional m3.xlarge for the "High" networking).