Open TheRockyOng opened 9 years ago
It should be possible, although I didn't look into details yet. I think It might be a good alternative approach.
I did consider this in #35 - word on the street is that Marathon are looking to move away from zookeeper as an implementation thing, and since you're poking around in znodes directly it's more likely to break underneath you.
Another issue is the leader URLs in zookeeper are the ones the marathons use to communicate; they aren't necessarily reachable from your service discovery servers.
The current solution works pretty well in practice (at least for me - happy to discuss alternatives!); in a failover situation bamboo would take a little longer to find an active master but that's a tradeoff I'm happy with to avoid complexity in the code / during outages.
+1 on this, I may take a stab at implementing it myself
Ran into an interesting problem that makes me +1 this.
I have three marathon nodes running in HA with a load balancer in front of them. I point bamboo at the load balancer for it to get its info. Normally this is fine. Today one of the marathon instances had an issue and wouldnt proxy to the master and would return empty responses for calls. (I forgot the record one, :disappointed: ) What ended up happening is as that instance made it to the top of the load balancer list, the haproxy config would be emptied. If that happened on all my nodes at the same time it caused downtime.
I see two possible solutions, have bamboo ignore bad messages (hard to do without a copy), or have it only talk to the master node.
@lancehudson ugh, not good. marathon has had shaky proxying for a good while, but i'd heard it was fixed in more recent versions. what version of marathon are you running?
10.1, soon ill be going to 11
FYI, I've implemented this, but on a branch that uses #151. I can make a PR with my solution once that is merged.
@lancehudson ah ok, i have a feeling the proxy fixes went in around 10.x so an upgrade may not help you much.
I've definitely seen a similar issue on our 0.7.x marathons, typically after some sort of network shenanigans (though we don't use a load balancer in front of the marathon nodes).
Yea, the load balancer was a quick easy service discovery mechanism.
@drewrobb I've landed #151 (sorry it took so long). Feel free to submit your PR now ;)
Thanks, PR in #183
is there anyway to populate the marathon endpoints from zookeeper instead of hardcoding the hostnames in the config file or env variable? or what is the elegant way to update all the endpoints for failover scenario?