Closed ptqa closed 2 years ago
This sounds like a cool feature and I think someone requested something similar before. One approach I can think of is to build a combined routing table from multiple consul instances but make fabio prefer local services. Then you could decide whether to load balance across AZs or use this for fault tolerance. Might even be useful to have a weight parameter in there like send N% of traffic to other AZs.
Are prepared queries of any help here? https://www.consul.io/docs/agent/http/query.html
Just throwing a suggestion.
@dsolsona I think that goes in the right direction.
For load balancing, @ptqa 's solution of retrieving all services from all datacenters and combining them seems the only reliable way. For failover setups, prepared queries would work nicely, although the query has to be created beforehand in consul by either fabio or the user.
The reason for this split is that prepared queries only ever give services in non-local DC's when there are no healthy instances in their local DC. So there is no way to have prepared queries give back all the services in all DCs. Conversely, it'll be hard to use any native consul methods to determine specific nearness of services in the combined list of multi-DC services. They can be sorted on nearness, probably, but then "manual" filtering on DC would need to happen and this could get ugly with more than 2 DCs.
The weight parameter is hard for the latter reason, but an optional switch between no multi-DC awareness, failover, and loadbalancing should be relatively easy.
I'll try to update the current POC to incorporate both options.
One of the things I have in the works is splitting the registry into a kv and multiple discovery modules. This would allow you to configure multiple consul discovery instances (and kubernetes, docker, ...) all at the same time and the combination of them generates the routing table. I think this should solve this problem. @madeddie feel free to improve the POC but be aware that it might not get merged. In any case it could serve as a good basis for the discussion.
Your solution is the better solution by far, but I can use the practice :)
FYI in case folks don't watch every commit that rolls into Consul's code base, something very close to what you were looking for, @madeddie, was added to Consul not long ago: https://github.com/hashicorp/consul/commit/2b2464403f93134a05eb5946e0b223199d364aa8
Indeed.
That actually doesn't change anything I described :) Prepared queries don't show containers in all DCs, they show non-local-DC containers only when the local-DC containers are all unhealthy. Unless that's changed too, in which case I'd be very interested :D
Just to say that use case happens even if you don't have multiple datacenters.
AFAIK in a "consul datacenter" every agent must able to talk with eachother using the mesh protocol.
So, if you have multiple isolated VLANs with different webapps in each VLAN, you have to run a distinct "consul datacenter" in each VLAN.
I'd like a single fabio instance to be able to route trafic to the relevant VLANs. But for that it needs to discover the healthy services in the different "datacenters".
The feature will be very useful for us too!
We're currently doing something very similar with consul-template and HAProxy. We let consul-template loop over all services in all DCs and enumerate them all as server
lines under backend
sections. The services that are pulled from DCs NOT matching the local DC are added as backup
under the backend
block. This is in effect a combination of a few of the above suggestions listing all services and having a weight attribute.
@leprechau I've been looking at doing something similar with HAProxy. How did you go about configuring this?
@jralph The consul-template file we're using is on GitHub but I wouldn't wish the task of using and/or maintaining this beast on anyone. We're currently running fabio across our dev/qa/prod environments and trying to eliminate our HAProxy/consul-template implementation.
... and ...
https://github.com/myENA/consul-template-rpm/blob/master/consul-template-haproxy.md
It's probably not an exaggeration to say this is an abuse of consul-template but it does work and served us in a production environment for over two years.
@leprechau Thanks for the info. How did you get around fabio not supporting multi-datacenters?
@jralph We're planning on deploying all services in at least two datacenters.
@leprechau But fabio is unable to detect services in another dc, so services would only be able to find other services within their own dc when using fabio? If so, that makes sense. I've been experimenting today and decided to stick with services only talking to their local services, using public domains to access other services if needed.
For those looking to use HAProxy, 1.8 added the ability to dynamically change backend destinations based on DNS SRV records. Might simplify writing out config files via consul-template
Is fabio now supported?
I'd like something similar but different. I'd like to have a single consul cluster but to have multiple nomad clusters (and/or nomad datacenters and regions) to separate work loads. If I point fabio at the consul cluster I'd get all jobs from all nomad clusters. The only solution I can think of is to abuse -registry.consul.tagprefix
and give the different nomad clusters different prefixes.
I'm building DC (aka Availability Zone) fault-tolerant setup with fabio, consul and ECS on AWS. I'm using active-active setup with separate consul cluster (3 nodes each) per DC.
In this setup I have multiple fabio instances behind ELB and services are running in both DC's at the same time. ELB uses round robin to spread load across fabio instances and it uses fabio http port for healthcheck.
The problem here is that in case of DC outage I can't be sure all services are running in each DC, so I would like fabio to be multi-DC aware and spread load across DC's, so I would have my service available even if it's not running in some DC.
I've implemented proof-of-concept in my fork (https://github.com/LibertyGlobal/fabio), it's hardcoded and ugly but it works. Idea is to create multiple goroutines with blocking query for each consul DC and when blocking query is over to query rest of datacenters and create config.
I can make a better version of that if I know that it can be accepted to upstream. @magiconair are you interested in such feature in fabio or I'm doing it wrong?