fabiolb / fabio

Consul Load-Balancing made simple
https://fabiolb.net
MIT License
7.25k stars 620 forks source link

Consul Route updates very slow with large numbers of routes #865

Closed ddreier closed 2 years ago

ddreier commented 2 years ago

We run several sets of Fabio instances, with the Consul backend, that register 20k and 60k routes. On the low end Fabio takes around a minute to process changes to the route table, and on the high end it's more like 5 to 6 minutes while Fabio is handling load.

I've been doing some experimenting and profiling of Fabio's code, and I believe that I might have identified a large time sink. Each time a new route is added into the route table, the routes (per host) get sorted. We almost never do host-based routing, so nearly 100% of our routes all fall into the blank host route list.

I dumped one of our route tables into a file and wrote a test to load that file and then create a route table from those commands. Something like this:

func TestNewTable(t *testing.T) {
    // read in a file with approximately 20k `route add` commands
    f, err := os.ReadFile("/path/to/some/fabio_routes.txt")
    if err != nil {...}

    b := bytes.NewBuffer(f)

    _, err := NewTable(b)
    if err != nil {...}
}

Without any code changes from release 1.5.15 this takes around 13 seconds on average.

=== RUN   TestNewTable
--- PASS: TestNewTable (13.42s)
PASS

CPU Profiling flame graph: image

When I remove the Sort call from Table.addRoute and do it at the end of NewTable instead, the test completes in less than a second on average.

=== RUN   TestNewTable
--- PASS: TestNewTable (0.62s)
PASS

CPU Profiling flame graph: image

So, that's a pretty significant improvement. I'm working on trying to get a more production-like test set up to measure the difference in update speed while Fabio is actually under load.

I'll have a pull request soon, and would really appreciate any feedback as well as consideration in merging and cutting a new release. Thanks!