fabiolb / fabio

Consul Load-Balancing made simple
https://fabiolb.net
MIT License
7.27k stars 616 forks source link

Multi-DC fabio #115

Closed ptqa closed 2 years ago

ptqa commented 8 years ago

I'm building DC (aka Availability Zone) fault-tolerant setup with fabio, consul and ECS on AWS. I'm using active-active setup with separate consul cluster (3 nodes each) per DC.

In this setup I have multiple fabio instances behind ELB and services are running in both DC's at the same time. ELB uses round robin to spread load across fabio instances and it uses fabio http port for healthcheck.

The problem here is that in case of DC outage I can't be sure all services are running in each DC, so I would like fabio to be multi-DC aware and spread load across DC's, so I would have my service available even if it's not running in some DC.

I've implemented proof-of-concept in my fork (https://github.com/LibertyGlobal/fabio), it's hardcoded and ugly but it works. Idea is to create multiple goroutines with blocking query for each consul DC and when blocking query is over to query rest of datacenters and create config.

I can make a better version of that if I know that it can be accepted to upstream. @magiconair are you interested in such feature in fabio or I'm doing it wrong?

magiconair commented 8 years ago

This sounds like a cool feature and I think someone requested something similar before. One approach I can think of is to build a combined routing table from multiple consul instances but make fabio prefer local services. Then you could decide whether to load balance across AZs or use this for fault tolerance. Might even be useful to have a weight parameter in there like send N% of traffic to other AZs.

dsolsona commented 8 years ago

Are prepared queries of any help here? https://www.consul.io/docs/agent/http/query.html

Just throwing a suggestion.

magiconair commented 8 years ago

@dsolsona I think that goes in the right direction.

madeddie commented 8 years ago

For load balancing, @ptqa 's solution of retrieving all services from all datacenters and combining them seems the only reliable way. For failover setups, prepared queries would work nicely, although the query has to be created beforehand in consul by either fabio or the user.

The reason for this split is that prepared queries only ever give services in non-local DC's when there are no healthy instances in their local DC. So there is no way to have prepared queries give back all the services in all DCs. Conversely, it'll be hard to use any native consul methods to determine specific nearness of services in the combined list of multi-DC services. They can be sorted on nearness, probably, but then "manual" filtering on DC would need to happen and this could get ugly with more than 2 DCs.

The weight parameter is hard for the latter reason, but an optional switch between no multi-DC awareness, failover, and loadbalancing should be relatively easy.

I'll try to update the current POC to incorporate both options.

magiconair commented 8 years ago

One of the things I have in the works is splitting the registry into a kv and multiple discovery modules. This would allow you to configure multiple consul discovery instances (and kubernetes, docker, ...) all at the same time and the combination of them generates the routing table. I think this should solve this problem. @madeddie feel free to improve the POC but be aware that it might not get merged. In any case it could serve as a good basis for the discussion.

madeddie commented 8 years ago

Your solution is the better solution by far, but I can use the practice :)

sean- commented 8 years ago

FYI in case folks don't watch every commit that rolls into Consul's code base, something very close to what you were looking for, @madeddie, was added to Consul not long ago: https://github.com/hashicorp/consul/commit/2b2464403f93134a05eb5946e0b223199d364aa8

magiconair commented 8 years ago

Indeed.

madeddie commented 8 years ago

That actually doesn't change anything I described :) Prepared queries don't show containers in all DCs, they show non-local-DC containers only when the local-DC containers are all unhealthy. Unless that's changed too, in which case I'd be very interested :D

stephane-martin commented 7 years ago

Just to say that use case happens even if you don't have multiple datacenters.

AFAIK in a "consul datacenter" every agent must able to talk with eachother using the mesh protocol.

So, if you have multiple isolated VLANs with different webapps in each VLAN, you have to run a distinct "consul datacenter" in each VLAN.

I'd like a single fabio instance to be able to route trafic to the relevant VLANs. But for that it needs to discover the healthy services in the different "datacenters".

bogdanov1609 commented 7 years ago

The feature will be very useful for us too!

aaronhurt commented 7 years ago

We're currently doing something very similar with consul-template and HAProxy. We let consul-template loop over all services in all DCs and enumerate them all as server lines under backend sections. The services that are pulled from DCs NOT matching the local DC are added as backup under the backend block. This is in effect a combination of a few of the above suggestions listing all services and having a weight attribute.

jralph commented 6 years ago

@leprechau I've been looking at doing something similar with HAProxy. How did you go about configuring this?

aaronhurt commented 6 years ago

@jralph The consul-template file we're using is on GitHub but I wouldn't wish the task of using and/or maintaining this beast on anyone. We're currently running fabio across our dev/qa/prod environments and trying to eliminate our HAProxy/consul-template implementation.

https://github.com/myENA/consul-template-rpm/blob/master/SOURCES/consul-template-haproxy-template.ctmpl

... and ...

https://github.com/myENA/consul-template-rpm/blob/master/consul-template-haproxy.md

It's probably not an exaggeration to say this is an abuse of consul-template but it does work and served us in a production environment for over two years.

jralph commented 6 years ago

@leprechau Thanks for the info. How did you get around fabio not supporting multi-datacenters?

aaronhurt commented 6 years ago

@jralph We're planning on deploying all services in at least two datacenters.

jralph commented 6 years ago

@leprechau But fabio is unable to detect services in another dc, so services would only be able to find other services within their own dc when using fabio? If so, that makes sense. I've been experimenting today and decided to stick with services only talking to their local services, using public domains to access other services if needed.

zakk4223 commented 6 years ago

For those looking to use HAProxy, 1.8 added the ability to dynamically change backend destinations based on DNS SRV records. Might simplify writing out config files via consul-template

gistao commented 5 years ago

Is fabio now supported?

kneufeld commented 5 years ago

I'd like something similar but different. I'd like to have a single consul cluster but to have multiple nomad clusters (and/or nomad datacenters and regions) to separate work loads. If I point fabio at the consul cluster I'd get all jobs from all nomad clusters. The only solution I can think of is to abuse -registry.consul.tagprefix and give the different nomad clusters different prefixes.