golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.98k stars 17.67k forks source link

x/net/http2: standard http client panics listing an evictCount #43051

Closed sergazyyev closed 2 years ago

sergazyyev commented 3 years ago

What version of Go are you using (go version)?

go1.15.4 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

GOHOSTARCH="amd64" GOHOSTOS="linux"

What did you do?

I use standard http client for one host service for multiple time. I configure http client as client = &http.Client{Timeout: timeOut}. In server sometime i gets panic from hpack http2 package and after that service stopping serve all next requests

What did you expect to see?

work without panic and error

What did you see instead?

panic stack trace runtime/debug.Stack(0xc0005edf38, 0x4e77ba, 0xc0000ea1e0) /usr/local/go/src/runtime/debug/stack.go:24 +0x9f runtime/debug.PrintStack() /usr/local/go/src/runtime/debug/stack.go:16 +0x25 main.main.func2.1(0xc0005efb38) /app/main.go:75 +0x57 panic(0xba4ca0, 0xc0011e81f0) /usr/local/go/src/runtime/panic.go:969 +0x1b9 vendor/golang.org/x/net/http2/hpack.(headerFieldTable).idToIndex(0xc001290280, 0xa, 0xc0005ee1f0) /usr/local/go/src/vendor/golang.org/x/net/http2/hpack/tables.go:119 +0x12d vendor/golang.org/x/net/http2/hpack.(headerFieldTable).search(0xc001290280, 0xc000582330, 0xd, 0xc00064e000, 0x439, 0x0, 0x17, 0x0) /usr/local/go/src/vendor/golang.org/x/net/http2/hpack/tables.go:106 +0x150 vendor/golang.org/x/net/http2/hpack.(Encoder).searchTable(0xc001290280, 0xc000582330, 0xd, 0xc00064e000, 0x439, 0x0, 0x0, 0x400) /usr/local/go/src/vendor/golang.org/x/net/http2/hpack/encode.go:97 +0x92 vendor/golang.org/x/net/http2/hpack.(Encoder).WriteField(0xc001290280, 0xc000582330, 0xd, 0xc00064e000, 0x439, 0x0, 0x0, 0x0) /usr/local/go/src/vendor/golang.org/x/net/http2/hpack/encode.go:62 +0x110 net/http.(http2ClientConn).writeHeader(0xc000ee9500, 0xc000582330, 0xd, 0xc00064e000, 0x439) /usr/local/go/src/net/http/h2_bundle.go:8118 +0xbc net/http.(http2ClientConn).encodeHeaders.func3(0xc9d8ef, 0xd, 0xc00064e000, 0x439) /usr/local/go/src/net/http/h2_bundle.go:8057 +0x93 net/http.(http2ClientConn).encodeHeaders.func1(0xc00128c340) /usr/local/go/src/net/http/h2_bundle.go:8023 +0x3d5 net/http.(http2ClientConn).encodeHeaders(0xc000ee9500, 0xc001034200, 0x1, 0x0, 0x0, 0x0, 0xc0001447b0, 0x15, 0x1b1f1f8, 0x7fa5ab7ea1f8, ...) /usr/local/go/src/net/http/h2_bundle.go:8055 +0x637 net/http.(http2ClientConn).roundTrip(0xc000ee9500, 0xc001034200, 0x0, 0x0, 0x0, 0x0) /usr/local/go/src/net/http/h2_bundle.go:7524 +0x2bb net/http.(http2Transport).RoundTripOpt(0xc000f000c0, 0xc001034200, 0xbd3200, 0xc0010302a0, 0xc000f2e1c0, 0x5) /usr/local/go/src/net/http/h2_bundle.go:6948 +0x170 net/http.(http2Transport).RoundTrip(...) /usr/local/go/src/net/http/h2_bundle.go:6909 net/http.http2noDialH2RoundTripper.RoundTrip(0xc000f000c0, 0xc001034200, 0xd77fa0, 0xc000f000c0, 0x0) /usr/local/go/src/net/http/h2_bundle.go:9119 +0x3e net/http.(Transport).roundTrip(0x1ae3a40, 0xc001034200, 0xc824c0, 0x7fa5abb35101, 0xc001034200) /usr/local/go/src/net/http/transport.go:527 +0xd8c net/http.(Transport).RoundTrip(0x1ae3a40, 0xc001034200, 0x1ae3a40, 0xbfebaa82c332827f, 0x1b5cf1f8e91f) /usr/local/go/src/net/http/roundtrip.go:17 +0x35 net/http.send(0xc001034000, 0xd76e40, 0x1ae3a40, 0xbfebaa82c332827f, 0x1b5cf1f8e91f, 0xc00100c2a0, 0xc000010208, 0xbfebaa82c332827f, 0x1, 0x0) /usr/local/go/src/net/http/client.go:252 +0x453 net/http.(Client).send(0xc0010301e0, 0xc001034000, 0xbfebaa82c332827f, 0x1b5cf1f8e91f, 0xc00100c2a0, 0xc000010208, 0x0, 0x1, 0x203000) /usr/local/go/src/net/http/client.go:176 +0xff net/http.(Client).do(0xc0010301e0, 0xc001034000, 0x0, 0x0, 0x0) /usr/local/go/src/net/http/client.go:718 +0x45f net/http.(Client).Do(...) /usr/local/go/src/net/http/client.go:586 subscriptions-service/models/subscriptions.(ServicesRequest).postGetServices(0xc0001457a0, 0xc0001457d0, 0xc000fbc060, 0x0, 0x0, 0x0, 0x0, 0x0) /app/models/subscriptions/services.go:174 +0x6e5 subscriptions-service/models/subscriptions.(ServicesRequest).Services(0xc0001457a0, 0xc0005ef7d0, 0x4, 0xc0001457c0, 0x1) /app/models/subscriptions/services.go:128 +0xd4 subscriptions-service/controllers/subscriptions.Services(0xc000e95180, 0xc0005a3290, 0x9) /app/controllers/subscriptions/services.go:29 +0x3a6 github.com/gofiber/fiber/v2.(App).next(0xc0010005a0, 0xc000e95180, 0xc00064e000, 0x439, 0x480) /go/pkg/mod/github.com/gofiber/fiber/v2@v2.2.0/router.go:126 +0x1b9 github.com/gofiber/fiber/v2.(Ctx).Next(0xc000e95180, 0xc9d8ef, 0xd) /go/pkg/mod/github.com/gofiber/fiber/v2@v2.2.0/ctx.go:632 +0x88 subscriptions-service/controllers/middleware.CheckAuthHeaderMiddleware(0xc000e95180, 0xca7818, 0x1b) /app/controllers/middleware/auth_header.go:18 +0x16f github.com/gofiber/fiber/v2.(Ctx).Next(0xc000e95180, 0xca7818, 0x1b) /go/pkg/mod/github.com/gofiber/fiber/v2@v2.2.0/ctx.go:629 +0x59 github.com/gofiber/fiber/v2/middleware/cors.New.func1(0xc000e95180, 0xc000ffad80, 0xc0005a3290) /go/pkg/mod/github.com/gofiber/fiber/v2@v2.2.0/middleware/cors/cors.go:141 +0x305 github.com/gofiber/fiber/v2.(Ctx).Next(0xc000e95180, 0x1, 0xc00105c6f8) /go/pkg/mod/github.com/gofiber/fiber/v2@v2.2.0/ctx.go:629 +0x59 main.main.func2(0xc000e95180, 0x0, 0x0) /app/main.go:82 +0x5f github.com/gofiber/fiber/v2.(App).next(0xc0010005a0, 0xc000e95180, 0xc000e95180, 0x32e249f, 0x32e249f00000010) /go/pkg/mod/github.com/gofiber/fiber/v2@v2.2.0/router.go:126 +0x1b9 github.com/gofiber/fiber/v2.(App).handler(0xc0010005a0, 0xc000eaaf00) /go/pkg/mod/github.com/gofiber/fiber/v2@v2.2.0/router.go:154 +0x11a github.com/valyala/fasthttp.(Server).serveConn(0xc000f5e900, 0xd92840, 0xc0005aa008, 0x0, 0x0) /go/pkg/mod/github.com/valyala/fasthttp@v1.17.0/server.go:2168 +0x11ac github.com/valyala/fasthttp.(workerPool).workerFunc(0xc001008140, 0xc00128de00) /go/pkg/mod/github.com/valyala/fasthttp@v1.17.0/workerpool.go:223 +0xc2 github.com/valyala/fasthttp.(workerPool).getCh.func1(0xc001008140, 0xc00128de00, 0xb8a540, 0xc00128de00) /go/pkg/mod/github.com/valyala/fasthttp@v1.17.0/workerpool.go:195 +0x35 created by github.com/valyala/fasthttp.(workerPool).getCh /go/pkg/mod/github.com/valyala/fasthttp@v1.17.0/workerpool.go:194 +0x115 time="2020-12-07T19:47:25+06:00" level=error msg="id (10) <= evictCount (12)" function=func1 line=56 package=main.main

mvdan commented 3 years ago

Why did you duplicate https://github.com/golang/go/issues/43048#event-4080325045? This is likely still not enough information to tell if this is a bug in Go or not.

sergazyyev commented 3 years ago

I closed my previous ticket. This bug produce only on my prod server and I cannot reproduce it manually. What information I can provide to you for analyse this issue?

cagedmantis commented 3 years ago

As @mvdan stated in #43048, we need some manner of reproducing this failure in order to isolate the problem.

sergazyyev commented 3 years ago

I couldn't reproduce this issue manually. I tried to do it in local machine, in docker container and under difference loads. As i mentioned before it happening in kuber cluster when i send request outside from standard http client. Now i change value of Transport. ForceAttemptHTTP2 to false and it seems like it helped but i don't understand the reason of this problem.

sergazyyev commented 3 years ago

hpackEvicCountPanic.txt I got http2 debug trace log when cause this panic, maybe it will help to solve this issue

liandeliang commented 3 years ago

Hi, i got this error message in recent days: panic: id (34) <= evictCount (47) go 1.15.8

"panic: id (34) <= evictCount (47)

goroutine 2059472 [running]: vendor/golang.org/x/net/http2/hpack.(headerFieldTable).idToIndex(0x400247a180, 0x22, 0x40001d3958) \tvendor/golang.org/x/net/http2/hpack/tables.go:119 +0x114 vendor/golang.org/x/net/http2/hpack.(headerFieldTable).search(0x400247a180, 0x4337b3, 0xa, 0x40000e2408, 0x11, 0x0, 0x1, 0xc5100) \tvendor/golang.org/x/net/http2/hpack/tables.go:106 +0x104 vendor/golang.org/x/net/http2/hpack.(Encoder).searchTable(0x400247a180, 0x4337b3, 0xa, 0x40000e2408, 0x11, 0x0, 0x0, 0x200) \tvendor/golang.org/x/net/http2/hpack/encode.go:97 +0x98 vendor/golang.org/x/net/http2/hpack.(Encoder).WriteField(0x400247a180, 0x4337b3, 0xa, 0x40000e2408, 0x11, 0x0, 0x40000d2000, 0x40000d2050) \tvendor/golang.org/x/net/http2/hpack/encode.go:62 +0xe8 net/http.(http2ClientConn).writeHeader(0x4002820300, 0x4337b3, 0xa, 0x40000e2408, 0x11) \tnet/http/h2_bundle.go:8124 +0x8c net/http.(http2ClientConn).encodeHeaders.func3(0x4337b3, 0xa, 0x40000e2408, 0x11) \tnet/http/h2_bundle.go:8063 +0x7c net/http.(http2ClientConn).encodeHeaders.func1(0x40017a5520) \tnet/http/h2_bundle.go:7962 +0x8c net/http.(http2ClientConn).encodeHeaders(0x4002820300, 0x4001c10b00, 0x1, 0x0, 0x0, 0x5d, 0xc9, 0x0, 0x0, 0x4b3, ...) \tnet/http/h2_bundle.go:8061 +0x494 net/http.(http2ClientConn).roundTrip(0x4002820300, 0x4001c10b00, 0x0, 0x0, 0x0, 0x0) \tnet/http/h2_bundle.go:7524 +0x230 net/http.(http2Transport).RoundTripOpt(0x40000e0720, 0x4001c10b00, 0x3a5200, 0x400019a690, 0x40000e2400, 0x5) \tnet/http/h2_bundle.go:6948 +0x144 net/http.(http2Transport).RoundTrip(...) \tnet/http/h2_bundle.go:6909 net/http.http2noDialH2RoundTripper.RoundTrip(0x40000e0720, 0x4001c10b00, 0x4b6bc0, 0x40000e0720, 0x0) \tnet/http/h2_bundle.go:9127 +0x34 net/http.(Transport).roundTrip(0x40000ca640, 0x4001c10a00, 0x4002a6a798, 0x1bff0, 0xffffba710e98) \tnet/http/transport.go:527 +0xa44 net/http.(Transport).RoundTrip(0x40000ca640, 0x4001c10a00, 0x40000ca640, 0x0, 0x0) \tnet/http/roundtrip.go:17 +0x30 net/http.send(0x4001c10a00, 0x4b5f60, 0x40000ca640, 0x0, 0x0, 0x0, 0x0, 0x40017a63a0, 0x4002a6a9b8, 0x1) \tnet/http/client.go:252 +0x34c net/http.(Client).send(0x400019a510, 0x4001c10a00, 0x0, 0x0, 0x0, 0x40017a63a0, 0x0, 0x1, 0x4002a6aa38) \tnet/http/client.go:176 +0xc8 net/http.(Client).do(0x400019a510, 0x4001c10a00, 0x0, 0x0, 0x0) \tnet/http/client.go:718 +0x340 net/http.(Client).Do(...) \tnet/http/client.go:586 ... my user code ...

goroutine 73 [select]: net/http.(http2ClientConn).roundTrip(0x4002820300, 0x4000189500, 0x0, 0x0, 0x0, 0x0) \tnet/http/h2_bundle.go:7609 +0x7bc net/http.(http2Transport).RoundTripOpt(0x40000e0720, 0x4000189500, 0x3a5200, 0x400019a690, 0x40000ba900, 0x5) \tnet/http/h2_bundle.go:6948 +0x144 net/http.(http2Transport).RoundTrip(...) \tnet/http/h2_bundle.go:6909 net/http.http2noDialH2RoundTripper.RoundTrip(0x40000e0720, 0x4000189500, 0x4b6bc0, 0x40000e0720, 0x0) \tnet/http/h2_bundle.go:9127 +0x34 net/http.(Transport).roundTrip(0x40000ca640, 0x4000189500, 0x40001d1188, 0x1bff0, 0xffffba711c28) \tnet/http/transport.go:527 +0xa44 net/http.(Transport).RoundTrip(0x40000ca640, 0x4000189500, 0x40000ca640, 0x0, 0x0) \tnet/http/roundtrip.go:17 +0x30 net/http.send(0x4000189500, 0x4b5f60, 0x40000ca640, 0x0, 0x0, 0x0, 0x0, 0x400188b9f0, 0x40001d13a8, 0x1) \tnet/http/client.go:252 +0x34c net/http.(Client).send(0x400019a510, 0x4000189500, 0x0, 0x0, 0x0, 0x400188b9f0, 0x0, 0x1, 0x40001d1428) \tnet/http/client.go:176 +0xc8 net/http.(Client).do(0x400019a510, 0x4000189500, 0x0, 0x0, 0x0) \tnet/http/client.go:718 +0x340 net/http.(Client).Do(...) \tnet/http/client.go:586 ... my user code ...

krishnakumar4a4 commented 3 years ago

We have similar issue reproduced on lower and prod environments. After couple of days of trying different combinations of configurations. Finally, I was able to figure out minimal configuration that fixed this panic.

defaultTransport := &http.Transport{}
defaultTransport.DialContext = (&net.Dialer{}).DialContext
http.Client{ Timeout: time.Second * 10, Transport: defaultTransport }

Although this doesn't seem to look like a fix. It helped my scenario where our transport was only http but not http/2. Here is a comment that kind of supported this fix which is documented for ForceAttemptHTTP2 flag

    // ForceAttemptHTTP2 controls whether HTTP/2 is enabled when a non-zero
    // Dial, DialTLS, or DialContext func or TLSClientConfig is provided.
    // By default, use of any those fields conservatively disables HTTP/2.
    // To use a custom dialer or TLS config and still attempt HTTP/2
    // upgrades, set this to true.
    ForceAttemptHTTP2 bool

It kind of says, if we override DialContext function with a Custom Dialer. Conservatively HTTP/2 will be disabled and we no longer see panic. I would like to spend some effort reproducing it locally and share it here If I can(I am doubtful at the moment).

krishnakumar4a4 commented 3 years ago

Hi @mvdan and @cagedmantis ,

Since I have personally faced this problem, I have done a rough analysis and I see a potential problem in http2 encoder which could be causing this panic intermittently.

All the panics happened so far, prints the id (%v) <= evictCount (%v) from here. The failing condition is id <= t.evictCount from idToIndex function which is part of searchTable operation.

Here is what I understood about these fields after some quick reading.

Value of unique id is always calculated as len(headers table) + evicCount + 1 as mentioned here . This tells us that any id that is generated cannot be less than the current evictCount itself. This is true as long as evictCount is not changed.

But evictCount would be changed, Please refer to these two points in the code

func (e *Encoder) WriteField(f HeaderField) error {
    e.buf = e.buf[:0]

    if e.tableSizeUpdate {
        e.tableSizeUpdate = false
        if e.minSize < e.dynTab.maxSize {
            e.buf = appendTableSize(e.buf, e.minSize)
        }
        e.minSize = uint32Max
        e.buf = appendTableSize(e.buf, e.dynTab.maxSize)
    }

    idx, nameValueMatch := e.searchTable(f) // This function throws panic
    if nameValueMatch {
        e.buf = appendIndexed(e.buf, idx)
    } else {
        indexing := e.shouldIndex(f)
        if indexing {
            e.dynTab.add(f) // This function can trigger eviction
        }

        if idx == 0 {
            e.buf = appendNewName(e.buf, f, indexing)
        } else {
            e.buf = appendIndexedName(e.buf, f, idx, indexing)
        }
    }
    n, err := e.w.Write(e.buf)
    if err == nil && n != len(e.buf) {
        err = io.ErrShortWrite
    }
    return err
}

While encoding a new duplicate header, e.dynTab.add(f) function could have triggered an eviction in the previous run, which in turn could have increased the evictCount. So when searchTable(f) is called in the current run, idToIndex function would throw this panic as id <= t.evictCount will no longer be false.

I may not have all details of the protocol or implementation. Would be interested to learn more.

neild commented 3 years ago

Do you ever write requests with trailers? CL 349594 fixes a race condition with HPACK encoding, but only for requests which contain trailers.

If you can try the most recent golang.org/x/net/http2 to see if it fixes the problem, that would be helpful; there have been a number of recent client fixes. You can replace the vendored stdlib HTTP/2 implementation with the one from x/net with:

import "golang.org/x/net/http2"
import "http"
http2.ConfigureTransport(http.DefaultTransport.(*http.Transport))
neild commented 3 years ago

@krishnakumar4a4 e.dynTab.add(f) can trigger an eviction, but eviction removes the evicted entry from the table so a future call to e.searchTable(f) won't find that entry. See (*headerFieldTable).evictOldest.

I've looked through the HPACK encoding code and if there's a way for this state to get out of sync without racy calls to the encoder, I'm not seeing it. I also too a shot at fuzzing the encoder, but that hasn't turned up anything yet. (I tried introducing a couple bugs to verify my fuzz function was working correctly, and the fuzzer found them instantly, so that was nice.)

So my best hypothesis at the moment is the known (and fixed) race condition in trailer encoding.

gopherbot commented 2 years ago

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)