Open sgrimm-sg opened 7 years ago
The quickest thing I can think of is to register a second health check with both services which is only green for the active instance. Then fabio should behave the way you want.
Ah, that's an interesting idea. I think a variant of it might be better: rather than a second health check for the same service, instead register a second service with its own health check and tag that one with urlprefix-
. That way the service for the hot spare node still shows up as healthy in Consul (meaning it can do stuff like create Consul sessions).
Thanks -- will give that a try.
For that approach you can guard the active service with a consul lock. In essence, you're performing a leader election and only the leader is active.
Yes, that's actually exactly what we're doing so our application knows internally when it has become the primary, but you can only start a session if you have a passing health check; without a healthy service to use for session establishment, the hot spare can't attempt to acquire the Consul lock. Kind of a chicken-and-egg problem.
Would you mind posting your experiences here?
Yesterday I tried switching to this setup (adding a second service with the urlprefix-
tag and a health check that passed only when the instance was the primary). It worked fine, but after finishing it I realized it ended up being more code than my original approach. It was also slightly slower in that it couldn't switch over to the new primary until after a health check which might mean waiting for the health check interval, whereas with the "reregister when you get the lock" approach, the tags can be updated immediately when the lock is acquired.
What I'm currently doing looks roughly like this. It is a little complex because I want to ensure that, when I'm doing a clean shutdown of the current primary (software upgrades, etc.), there's never a period when Fabio has nowhere to route a client request.
primaryAddress
Consul key and store it locallyprimaryAddress
with our address as the valueprimaryAddress
is set:
primaryAddress
is not set:
primaryAddress
with our addressprimaryAddress
primaryAddress
primaryAddress
copy is not set:primaryAddress
copy of whichever instance answers the requestprimaryAddress
copy to its valueThe "send a request through Fabio" sequence at the end means that during shutdown, there will be a brief window when both hosts' services are tagged with urlprefix-
, but there should only be one host that actually does work at any given time.
The reason that the shutdown sequence queries the primary address through Fabio rather than relying on the state of the Consul key is because there is some nonzero amount of time between my service updating Consul and Fabio updating its routing tables, and I want to make sure the old primary doesn't stop accepting work until after I've confirmed that Fabio has started sending requests to the new one; otherwise there'd be a brief period of unavailability. Only tens of milliseconds, but my service's clients seem to excel at sending requests at the exact moment it goes offline briefly! The alternative would have been to query Fabio's routing config from Consul but then I'd have to parse Fabio's routing configuration in my application which seemed unnecessarily complex.
Feedback welcome, of course. Perhaps there's a simpler approach that would provide the same availability.
Is there a window where both primary and secondary can handle requests? I'd guess so since you want the old primary to complete existing requests while the new primary starts handling requests.
Also, what kind of throughput are you looking at and what latency do you expect for the failover?
I still think that adding a second health check for the service with a short check interval (1s or 500ms) which is only green for the leader is the simplest option. No re-registration necessary and since you already have the leader election code you only need to expose its status via the health endpoint.
Also, I'd optimize for the normal case which is orderly failovers.
fabio will complete existing requests after switching the routing table.
When I try to raise a ticket, found this one. It is still open and 2 years ago, not sure if there is the best approach in 2020.
--- My question: In nginx upstream, it supports a "backup" policy, from their spec: marks the server as a backup server. It will be passed requests when the primary servers are unavailable.
I want to do similar route policy on fabio, is that possible? I only find weight option, but it seems not perfect for this purpose.
My use case is, I have a service deploy to two physic servers. One has high profile(server A), another server is just for backup with lower profile(server B). I want to all traffic point to server A , except the server A is down.
We're using Fabio in front of a microservice that runs on a single node, but that has a hot spare on another node for failover. The goal is for Fabio to normally route 100% of requests to the primary node, but then switch over to the spare when the primary fails its health check. Once a failover has happened, it should be sticky, that is, the node that used to be the primary should then be considered the hot spare when it comes back online.
It seems like there are a few different ways to handle this in Fabio and it would be great if there were some guidance about the best approach.
Our current solution is to have the service register itself in Consul without a
urlprefix-
tag at startup. When it detects that it's the primary node (either because it's the first one running or because the primary has gone down) it reregisters with theurlprefix-
tag.https://github.com/hashicorp/consul/issues/1048 would be a clean solution here but in the meantime perhaps there's a better way of doing this than the one we settled on. It would be nice to avoid having to reregister the service.