fabiolb / fabio

Consul Load-Balancing made simple
https://fabiolb.net
MIT License
7.25k stars 621 forks source link

Add command line flag to toggle required consistency on consul reads #811

Closed jeremycw closed 2 years ago

jeremycw commented 3 years ago

It seems like there are some Consul query options that would be nice to expose to the user for configuration. Specifically I need default consistency mode for Consul reads (https://www.consul.io/api-docs/features/consistency) so I kind of just crammed a toggle for the RequireConsistent query param onto the command line.

This is probably not mergable as-is but I'd like to open this up for discussion since there are probably more query options that should be configurable by the user. For example if someone wants stale consistency mode it's still not possible but i figured I'd start getting input before writing too much code.

@scalp42

evandam commented 3 years ago

Hey folks, we rolled this out and it works great for Fabio being able to continue handling requests during a Consul leader election with the following:

registry.consul.requireConsistent = false
registry.consul.allowStale = true

What do we need to do get this merged?

ketzacoatl commented 3 years ago

@nathanejohnson / @leprechau / @pschultz, is it expected behavior for fabio to go dark during a consul leader election? It seems protecting against that would be very useful.

scalp42 commented 3 years ago

I can confirm, fork works perfectly. Fabio won't drop routes when there's a leadership election as expected with the stale reads.

pschultz commented 3 years ago

My expectation is for fabio to fail static by default, i.e. routes should not change if Consul goes away.

scalp42 commented 3 years ago

@pschultz unfortunately, that's not the behavior we're seeing. 500s start being returned from local Consul when leadership election happens.

ketzacoatl commented 3 years ago

@pschultz unfortunately, that's not the behavior we're seeing. 500s start being returned from local Consul when leadership election happens.

@scalp42 do you also see change in fabio's routes as a result of the 500 from consul?

evandam commented 3 years ago

Hi @ketzacoatl

I can confirm that Fabio does drop routes during a new Consul leader election. This isn't the case when enabling allowStale in this PR.

Here are logs hitting a service in Fabio every second while restarting Consul on the leader in a cluster of 3 servers: https://gist.github.com/evandam/b8f27c76bebf58634f6c5d412f61c0bc

You can see the 500 errors from Consul, followed by removing the route from Fabio, and 404s until a new leader is elected (18 seconds of downtime in this example).

If this isn't expected, is this new behavior introduced in a recent release?

ketzacoatl commented 3 years ago

@evandam thank you for taking the time to run that isolated test and extracting the logs for this PR! That's a great way to confirm the behavior.

@scalp42 / @pschultz, how should we proceed? Is there a regression to investigate that makes this feature/option not matter, or is this the proper way to handle the upstream outage with consul? I believe it's the latter, but maybe there's a reason for the former?

scalp42 commented 3 years ago

@ketzacoatl I can confirm all the routes get removed in Fabio logs.

evandam commented 3 years ago

For what it's worth, I did the same tests on previous versions of Fabio back to 1.5.9 (May 2018) and see the same, so this doesn't seem to be new behavior unless it goes back farther than that.

pschultz commented 3 years ago

Is there a regression to investigate that makes this feature/option not matter, or is this the proper way to handle the upstream outage with consul? I believe it's the latter.

I agree. I suggest to do what @evandam proposed and let the defaults be

registry.consul.requireConsistent = false
registry.consul.allowStale = true

This is a big enough change to warrant a minor version bump, I think.

evandam commented 3 years ago

@pschultz I think there may need to be some extra validation added to the config. requireConsistent and allowStale can both be set to true, but requests to Consul will fail since it's not valid.

fabio[21514]: 2021/02/10 23:09:11 [WARN] consul: Error fetching health state. Unexpected response code: 400 (Cannot specify ?stale with ?consistent, conflicting semantics.)
ketzacoatl commented 3 years ago

Extra safety there sounds great :+1:

aaronhurt commented 3 years ago

After reading through all the comments and testing I'm comfortable merging this once the additional validation is added and the defaults reflect the current operating state.

jeremycw commented 3 years ago

@leprechau I've added a validation to return an error if allowStale and requireConsistent are both true. The defaults should already match the existing configuration:

allowStale: false
requireConsistent: true

@scalp42

aaronhurt commented 3 years ago

Thank you, I'll give this one more look and get it merged when I get back to a real browser.

jeremycw commented 2 years ago

@leprechau Any update on this?

daande commented 2 years ago

Any update on this? @leprechau

jeremycw commented 2 years ago

@nathanejohnson Hey, it looks like this was about to be merged but then things went silent. Could we get another pair of eyes on this?

nathanejohnson commented 2 years ago

@jeremycw thank you for bringing this to my attention. I will review this ASAP!

jeremycw commented 2 years ago

@nathanejohnson Any thoughts or updates?

scalp42 commented 2 years ago

nathanejohnson commented 2 years ago

Sorry this took so long!