Closed jeremycw closed 2 years ago
Hey folks, we rolled this out and it works great for Fabio being able to continue handling requests during a Consul leader election with the following:
registry.consul.requireConsistent = false
registry.consul.allowStale = true
What do we need to do get this merged?
@nathanejohnson / @leprechau / @pschultz, is it expected behavior for fabio to go dark during a consul leader election? It seems protecting against that would be very useful.
I can confirm, fork works perfectly. Fabio won't drop routes when there's a leadership election as expected with the stale reads.
My expectation is for fabio to fail static by default, i.e. routes should not change if Consul goes away.
@pschultz unfortunately, that's not the behavior we're seeing. 500s start being returned from local Consul when leadership election happens.
@pschultz unfortunately, that's not the behavior we're seeing. 500s start being returned from local Consul when leadership election happens.
@scalp42 do you also see change in fabio's routes as a result of the 500 from consul?
Hi @ketzacoatl
I can confirm that Fabio does drop routes during a new Consul leader election. This isn't the case when enabling allowStale
in this PR.
Here are logs hitting a service in Fabio every second while restarting Consul on the leader in a cluster of 3 servers: https://gist.github.com/evandam/b8f27c76bebf58634f6c5d412f61c0bc
You can see the 500 errors from Consul, followed by removing the route from Fabio, and 404s until a new leader is elected (18 seconds of downtime in this example).
If this isn't expected, is this new behavior introduced in a recent release?
@evandam thank you for taking the time to run that isolated test and extracting the logs for this PR! That's a great way to confirm the behavior.
@scalp42 / @pschultz, how should we proceed? Is there a regression to investigate that makes this feature/option not matter, or is this the proper way to handle the upstream outage with consul? I believe it's the latter, but maybe there's a reason for the former?
@ketzacoatl I can confirm all the routes get removed in Fabio logs.
For what it's worth, I did the same tests on previous versions of Fabio back to 1.5.9
(May 2018) and see the same, so this doesn't seem to be new behavior unless it goes back farther than that.
Is there a regression to investigate that makes this feature/option not matter, or is this the proper way to handle the upstream outage with consul? I believe it's the latter.
I agree. I suggest to do what @evandam proposed and let the defaults be
registry.consul.requireConsistent = false
registry.consul.allowStale = true
This is a big enough change to warrant a minor version bump, I think.
@pschultz I think there may need to be some extra validation added to the config. requireConsistent and allowStale can both be set to true, but requests to Consul will fail since it's not valid.
fabio[21514]: 2021/02/10 23:09:11 [WARN] consul: Error fetching health state. Unexpected response code: 400 (Cannot specify ?stale with ?consistent, conflicting semantics.)
Extra safety there sounds great :+1:
After reading through all the comments and testing I'm comfortable merging this once the additional validation is added and the defaults reflect the current operating state.
@leprechau I've added a validation to return an error if allowStale and requireConsistent are both true. The defaults should already match the existing configuration:
allowStale: false
requireConsistent: true
@scalp42
Thank you, I'll give this one more look and get it merged when I get back to a real browser.
@leprechau Any update on this?
Any update on this? @leprechau
@nathanejohnson Hey, it looks like this was about to be merged but then things went silent. Could we get another pair of eyes on this?
@jeremycw thank you for bringing this to my attention. I will review this ASAP!
@nathanejohnson Any thoughts or updates?
Sorry this took so long!
It seems like there are some Consul query options that would be nice to expose to the user for configuration. Specifically I need
default
consistency mode for Consul reads (https://www.consul.io/api-docs/features/consistency) so I kind of just crammed a toggle for the RequireConsistent query param onto the command line.This is probably not mergable as-is but I'd like to open this up for discussion since there are probably more query options that should be configurable by the user. For example if someone wants
stale
consistency mode it's still not possible but i figured I'd start getting input before writing too much code.@scalp42