fabiolb / fabio

Consul Load-Balancing made simple
https://fabiolb.net
MIT License
7.25k stars 620 forks source link

Crash: invalid log msg: http2: panic serving CLIENT_IP:CLIENT_PORT: runtime error: index out of range [-1] #872

Closed sbrl closed 1 year ago

sbrl commented 2 years ago

I'm getting a reproducible crash in Fabio:

invalid log msg: 2022/05/14 21:46:56 http2: panic serving 172.16.230.53:54970: runtime error: index out of range [-1]
goroutine 170 [running]:
net/http.(*http2serverConn).runHandler.func1()
        /usr/local/opt/go/libexec/src/net/http/h2_bundle.go:5896 +0x174
panic({0x828870, 0x32aa060})
        /usr/local/opt/go/libexec/src/runtime/panic.go:838 +0x23c
github.com/fabiolb/fabio/route.rndPicker(0x311c8c0)
        /Users/njohnson/jetbrains/fabio/src/github.com/fabiolb/fabio/route/picker.go:20 +0x58
github.com/fabiolb/fabio/route.Table.lookup(0x3000120, {0x33e52d7, 0x17}, {0x32f6000, 0x1f}, {0x0, 0x0}, 0x9063f4, 0x9063f0)
        /Users/njohnson/jetbrains/fabio/src/github.com/fabiolb/fabio/route/table.go:463 +0x194
github.com/fabiolb/fabio/route.Table.Lookup(0x3000120, 0x33fc080, {0x0, 0x0}, 0x9063f4, 0x9063f0, 0x3066090, 0x0)
        /Users/njohnson/jetbrains/fabio/src/github.com/fabiolb/fabio/route/table.go:424 +0x2dc
main.newHTTPProxy.func2(0x33fc080)
        /Users/njohnson/jetbrains/fabio/src/github.com/fabiolb/fabio/main.go:233 +0xc4
github.com/fabiolb/fabio/proxy.(*HTTPProxy).ServeHTTP(0x3336000, {0x9d663c, 0x3096178}, 0x33fc080)
        /Users/njohnson/jetbrains/fabio/src/github.com/fabiolb/fabio/proxy/http_proxy.go:100 +0x1b8
net/http.serverHandler.ServeHTTP({0x30aa2d0}, {0x9d663c, 0x3096178}, 0x33fc080)
        /usr/local/opt/go/libexec/src/net/http/server.go:2916 +0x408
net/http.initALPNRequest.ServeHTTP({{0x9d6be0, 0x32f26c0}, 0x319c400, {0x30aa2d0}}, {0x9d663c, 0x3096178}, 0x33fc080)
        /usr/local/opt/go/libexec/src/net/http/server.go:3523 +0x180
net/http.(*http2serverConn).runHandler(0x306e380, 0x3096178, 0x33fc080, 0x31180a0)
        /usr/local/opt/go/libexec/src/net/http/h2_bundle.go:5903 +0x98
created by net/http.(*http2serverConn).processHeaders
        /usr/local/opt/go/libexec/src/net/http/h2_bundle.go:5633 +0x690
invalid log msg: 2022/05/14 21:46:56 http2: panic serving 172.16.230.53:54970: runtime error: index out of range [-1]
goroutine 171 [running]:
net/http.(*http2serverConn).runHandler.func1()
        /usr/local/opt/go/libexec/src/net/http/h2_bundle.go:5896 +0x174
panic({0x828870, 0x3266010})
        /usr/local/opt/go/libexec/src/runtime/panic.go:838 +0x23c
github.com/fabiolb/fabio/route.rndPicker(0x311c8c0)
        /Users/njohnson/jetbrains/fabio/src/github.com/fabiolb/fabio/route/picker.go:20 +0x58
github.com/fabiolb/fabio/route.Table.lookup(0x3000120, {0x33e52d7, 0x17}, {0x32f6040, 0x1f}, {0x0, 0x0}, 0x9063f4, 0x9063f0)
        /Users/njohnson/jetbrains/fabio/src/github.com/fabiolb/fabio/route/table.go:463 +0x194
github.com/fabiolb/fabio/route.Table.Lookup(0x3000120, 0x33fc180, {0x0, 0x0}, 0x9063f4, 0x9063f0, 0x3066090, 0x0)
        /Users/njohnson/jetbrains/fabio/src/github.com/fabiolb/fabio/route/table.go:424 +0x2dc
main.newHTTPProxy.func2(0x33fc180)
        /Users/njohnson/jetbrains/fabio/src/github.com/fabiolb/fabio/main.go:233 +0xc4
github.com/fabiolb/fabio/proxy.(*HTTPProxy).ServeHTTP(0x3336000, {0x9d663c, 0x3096268}, 0x33fc180)
        /Users/njohnson/jetbrains/fabio/src/github.com/fabiolb/fabio/proxy/http_proxy.go:100 +0x1b8
net/http.serverHandler.ServeHTTP({0x30aa2d0}, {0x9d663c, 0x3096268}, 0x33fc180)
        /usr/local/opt/go/libexec/src/net/http/server.go:2916 +0x408
net/http.initALPNRequest.ServeHTTP({{0x9d6be0, 0x32f26c0}, 0x319c400, {0x30aa2d0}}, {0x9d663c, 0x3096268}, 0x33fc180)
        /usr/local/opt/go/libexec/src/net/http/server.go:3523 +0x180
net/http.(*http2serverConn).runHandler(0x306e380, 0x3096268, 0x33fc180, 0x3118190)
        /usr/local/opt/go/libexec/src/net/http/h2_bundle.go:5903 +0x98
created by net/http.(*http2serverConn).processHeaders
        /usr/local/opt/go/libexec/src/net/http/h2_bundle.go:5633 +0x690
invalid log msg: 2022/05/14 21:46:56 http2: panic serving 172.16.230.53:54970: runtime error: index out of range [-1]
goroutine 172 [running]:
net/http.(*http2serverConn).runHandler.func1()
        /usr/local/opt/go/libexec/src/net/http/h2_bundle.go:5896 +0x174
panic({0x828870, 0x33a00b0})
        /usr/local/opt/go/libexec/src/runtime/panic.go:838 +0x23c
github.com/fabiolb/fabio/route.rndPicker(0x311c8c0)
        /Users/njohnson/jetbrains/fabio/src/github.com/fabiolb/fabio/route/picker.go:20 +0x58
github.com/fabiolb/fabio/route.Table.lookup(0x3000120, {0x33e52d7, 0x17}, {0x32f6060, 0x1a}, {0x0, 0x0}, 0x9063f4, 0x9063f0)
        /Users/njohnson/jetbrains/fabio/src/github.com/fabiolb/fabio/route/table.go:463 +0x194
github.com/fabiolb/fabio/route.Table.Lookup(0x3000120, 0x33fc280, {0x0, 0x0}, 0x9063f4, 0x9063f0, 0x3066090, 0x0)
        /Users/njohnson/jetbrains/fabio/src/github.com/fabiolb/fabio/route/table.go:424 +0x2dc
main.newHTTPProxy.func2(0x33fc280)
        /Users/njohnson/jetbrains/fabio/src/github.com/fabiolb/fabio/main.go:233 +0xc4
github.com/fabiolb/fabio/proxy.(*HTTPProxy).ServeHTTP(0x3336000, {0x9d663c, 0x3096310}, 0x33fc280)
        /Users/njohnson/jetbrains/fabio/src/github.com/fabiolb/fabio/proxy/http_proxy.go:100 +0x1b8
net/http.serverHandler.ServeHTTP({0x30aa2d0}, {0x9d663c, 0x3096310}, 0x33fc280)
        /usr/local/opt/go/libexec/src/net/http/server.go:2916 +0x408
net/http.initALPNRequest.ServeHTTP({{0x9d6be0, 0x32f26c0}, 0x319c400, {0x30aa2d0}}, {0x9d663c, 0x3096310}, 0x33fc280)
        /usr/local/opt/go/libexec/src/net/http/server.go:3523 +0x180
net/http.(*http2serverConn).runHandler(0x306e380, 0x3096310, 0x33fc280, 0x3118200)
        /usr/local/opt/go/libexec/src/net/http/h2_bundle.go:5903 +0x98
created by net/http.(*http2serverConn).processHeaders
        /usr/local/opt/go/libexec/src/net/http/h2_bundle.go:5633 +0x690

Client details

Server details

sbrl commented 2 years ago

Ok, I have some more information. This only seems to happen if there's more than 1 backend Consul service.

tristanmorgan commented 2 years ago

Could it be two paths pointing to different services? some sort of duplication?

sbrl commented 2 years ago

I'm not sure what you mean, @tristanmorgan?

But what I do know is that if I change count = 1 to count = 2 in a Nomad job spec, then I get this crash in Fabio.

nathanejohnson commented 2 years ago

I'm not sure what you mean, @tristanmorgan?

But what I do know is that if I change count = 1 to count = 2 in a Nomad job spec, then I get this crash in Fabio.

Can you post this job spec? Can you post a little more information about how you're running fabio and consul in general?

sbrl commented 1 year ago

Sure thing, @nathanejohnson. Here's an example job spec:

job "files-public" {
    datacenters = ["dc1"]
    priority = 35

    # Spread allocations over all nodes
    spread {
        attribute =  "${node.unique.name}"
    }

    group "a" {
        count = 2

        network {
            port "main" { to = 5000 }
        }

        task "files-public" {
            driver = "docker"

            config {
                image = "myregistry.mydomain.tld:5000/node-serve"
                labels { group = "services" }
                ports = [ "main" ]

                mount {
                    type        = "bind"
                    target      = "/srv"
                    source      = "/mnt/elfstone/Public"
                    readonly    = true
                }
            }

            resources {
                # Increased from 25MiB on 2021-05-12 because it ran out of memory serving a 50MB file
                memory = 100 # MiB
            }
        }

        service {
            name = "${JOB}"
            tags = [
            "service",
            "urlprefix-sub.domain.com/"
            ]
            address_mode = "host"
            port = "main"

            check {
                type = "http"
                port = "main"
                interval = "60s"
                timeout = "30s"
                path = "/"
            }
        }
    }
}

Simply change count = 2 to count = 1 to fix the bug.

....the Dockerfile for that is really simple:

FROM node:latest
RUN npm install --global serve && rm -rf "$(npm get cache)";
VOLUME [ "/srv" ]
USER 80:80
ENV NODE_ENV production
WORKDIR /srv
ENTRYPOINT [ "serve", "-l", "5000" ]

Some details of my setup:

nathanejohnson commented 1 year ago

You getting that stack trace with "index out of range -1" almost seems to me some sort of golang runtime bug on armv7. The only pi I have going right now i a pi4 running on ubuntu 22.04 arm64, so I don't have an easy way to verify this.

This is the line from the stack trace.

For some reason randIntn is returning -1. Ideally this rand method would probably work differently, maybe I will refactor with math.Rand provided it doesn't benchmark significantly slower?

By the way,is your clock set correctly on the Pi?

nathanejohnson commented 1 year ago

One other thing, can you try configuring the proxy.strategy to "rr" and see if the problem goes away?

nathanejohnson commented 1 year ago

So the problem ended up being on 32 bit platforms, the random picker function was overflowing a 32 bit int, causing it to go negative. I ended up benchmarking the homegrown rand picker function against golang's math/rand rand.Intn function, and as it turns out, the math/rand function is 4x faster than the homegrown implementation. According to the comments above the homegrown implementation, at one point math/rand.Intn was fairly slow, but this it appears it's now quite fast. Faster than our broken version at any rate. I have a PR to fix this, I'll look at it more on Monday and merge this in with the next release. In the mean time, setting the proxy.strategy to "rr" should be a workaround.