consult zookeeper for marathon endpoints

QubitProducts / bamboo

HAProxy auto configuration and auto service discovery for Mesos Marathon

Apache License 2.0

794 stars 214 forks source link

consult zookeeper for marathon endpoints #130

Open TheRockyOng opened 9 years ago

TheRockyOng commented 9 years ago

is there anyway to populate the marathon endpoints from zookeeper instead of hardcoding the hostnames in the config file or env variable? or what is the elegant way to update all the endpoints for failover scenario?

j1n6 commented 9 years ago

It should be possible, although I didn't look into details yet. I think It might be a good alternative approach.

rasputnik commented 9 years ago

I did consider this in #35 - word on the street is that Marathon are looking to move away from zookeeper as an implementation thing, and since you're poking around in znodes directly it's more likely to break underneath you.

Another issue is the leader URLs in zookeeper are the ones the marathons use to communicate; they aren't necessarily reachable from your service discovery servers.

The current solution works pretty well in practice (at least for me - happy to discuss alternatives!); in a failover situation bamboo would take a little longer to find an active master but that's a tradeoff I'm happy with to avoid complexity in the code / during outages.

drewrobb commented 8 years ago

+1 on this, I may take a stab at implementing it myself

lancehudson commented 8 years ago

Ran into an interesting problem that makes me +1 this.

I have three marathon nodes running in HA with a load balancer in front of them. I point bamboo at the load balancer for it to get its info. Normally this is fine. Today one of the marathon instances had an issue and wouldnt proxy to the master and would return empty responses for calls. (I forgot the record one, :disappointed: ) What ended up happening is as that instance made it to the top of the load balancer list, the haproxy config would be emptied. If that happened on all my nodes at the same time it caused downtime.

I see two possible solutions, have bamboo ignore bad messages (hard to do without a copy), or have it only talk to the master node.

rasputnik commented 8 years ago

@lancehudson ugh, not good. marathon has had shaky proxying for a good while, but i'd heard it was fixed in more recent versions. what version of marathon are you running?

lancehudson commented 8 years ago

10.1, soon ill be going to 11

drewrobb commented 8 years ago

FYI, I've implemented this, but on a branch that uses #151. I can make a PR with my solution once that is merged.

rasputnik commented 8 years ago

@lancehudson ah ok, i have a feeling the proxy fixes went in around 10.x so an upgrade may not help you much.

I've definitely seen a similar issue on our 0.7.x marathons, typically after some sort of network shenanigans (though we don't use a load balancer in front of the marathon nodes).

lancehudson commented 8 years ago

Yea, the load balancer was a quick easy service discovery mechanism.

lclarkmichalek commented 8 years ago

@drewrobb I've landed #151 (sorry it took so long). Feel free to submit your PR now ;)

drewrobb commented 8 years ago

Thanks, PR in #183