golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.06k stars 17.68k forks source link

x/net/http2: header modifications after WriteHeader(http.StatusEarlyHints) cause a map race #67940

Open Timer opened 5 months ago

Timer commented 5 months ago

Go version

go version go1.22.1 linux/arm64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/home/timer/.cache/go-build'
GOENV='/home/timer/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/timer/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/timer/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go1.22.1.linux-arm64'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go1.22.1.linux-arm64/pkg/tool/linux_arm64'
GOVCS=''
GOVERSION='go1.22.1'
GCCGO='gccgo'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/timer/Development/proxy/https-terminator/go.mod'
GOWORK='/home/timer/Development/proxy/go.work'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1654315752=/tmp/go-build -gno-record-gcc-switches'

What did you do?

We run a HTTP/2 server that flushes an early headers map (status 103 - http.StatusEarlyHints) using func (http.ResponseWriter) WriteHeader(statusCode int).

After calling WriteHeader, we may need to mutate the Header map for the final 2xx response based on information we discover. In our application, we simply clear the Headers after flushing the early hint:

            r.W.Header().Set("Server", "...")
            r.W.Header().Set("Link", "...")

            r.W.WriteHeader(http.StatusEarlyHints)

            // remove the headers after sending the early hints to avoid
            // sending duplicate headers when the final response is sent:
            r.W.Header().Del("Server")
            r.W.Header().Del("Link")

What did you see happen?

After >60 minutes of runtime and millions of requests (this is hard to reproduce), we were faced with this crash:

fatal error: concurrent map read and map write
goroutine 44382232 [running]:
golang.org/x/net/http2.encodeHeaders(0xc09549f798?, 0x1412ef5?, {0x0?, 0xc0416e72b9?, 0xc07fbd2f18?})
golang.org/x/net@v0.24.0/http2/write.go:343 +0x156
golang.org/x/net/http2.(*writeResHeaders).writeFrame(0xc04d077c00, {0x15edb08, 0xc08528a000})
golang.org/x/net@v0.24.0/http2/write.go:217 +0xe9
golang.org/x/net/http2.(*serverConn).writeFrameAsync(0xc08528a000, {{0x15e9140?, 0xc04d077c00?}, 0xc01a73d860?, 0xc0326e0c60?}, 0xc070d666a8?)
golang.org/x/net@v0.24.0/http2/server.go:852 +0x74
created by golang.org/x/net/http2.(*serverConn).startFrameWrite in goroutine 44380821
golang.org/x/net@v0.24.0/http2/server.go:1266 +0x390
goroutine 1 [semacquire, 87 minutes]:

This appears to be due to writeFrameAsync running on a separate goroutine and the encodeHeaders function doing a naive iteration over the values (vv := h[k], line 343 from above).

What did you expect to see?

Encoding headers needs to be concurrent-safe now that a server can send and mutate its headers multiple times before sending the final response (e.g. 103, 103, 103, 200).

gabyhelp commented 5 months ago

Similar Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

prattmic commented 5 months ago

cc @neild @tombergan

ruyi789 commented 5 months ago

r.W.Header().Del("Server") Header doesn't need to be deleted? Because Response creates a new Header each time.

neild commented 5 months ago

WriteHeader should block until the headers have been sent, to avoid precisely this type of race. I've looked through this code path a few times now, and I don't see a way for the header map to be touched after WriteHeader returns, but I might be missing something.

Do you possibly have a middleware layer wrapping the ResponseWriter that might be deferring the WriteHeader call to a later time?

If you can run this under the race detector (go build -race), it might make the failure more reproducible.)

gopherbot commented 3 months ago

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

Timer commented 3 months ago

No, we are not wrapping the ResponseWriter with a middleware that'd delay the write. I'm enabling the race detector in hopes that it prints the information about the access and will report back today!

Timer commented 3 months ago

Got the race! Sharing it here while I continue to evaluate it.

2024-08-09T19:19:38.955255887Z ==================
2024-08-09T19:19:38.955262912Z WARNING: DATA RACE
2024-08-09T19:19:38.955265201Z Write at 0x00c00bfa5740 by goroutine 7675145:
2024-08-09T19:19:38.955267226Z   runtime.mapdelete_faststr()
2024-08-09T19:19:38.955269405Z       /usr/local/go/src/runtime/map_faststr.go:301 +0x0
2024-08-09T19:19:38.955271096Z   net/textproto.MIMEHeader.Del()
2024-08-09T19:19:38.955272820Z       /usr/local/go/src/net/textproto/header.go:55 +0x4d
2024-08-09T19:19:38.955274358Z   net/http.Header.Del()
2024-08-09T19:19:38.955275804Z       /usr/local/go/src/net/http/header.go:81 +0x27
2024-08-09T19:19:38.955277621Z   main.main.ProxyUpstream.func14()
2024-08-09T19:19:38.955279964Z       /usr/src/https-terminator/middlewares/proxyhttp/http_handler.go:562 +0x17cb
2024-08-09T19:19:38.955281075Z   main.main.REDACTED.func15()
2024-08-09T19:19:38.955282521Z       /usr/src/https-terminator/middlewares/proxyhttp/REDACTED.go:183 +0xc7e
2024-08-09T19:19:38.955283567Z   main.main.REDACTED.func16()
2024-08-09T19:19:38.955284607Z       /usr/src/https-terminator/middlewares/proxyhttp/REDACTED.go:26 +0xba4
2024-08-09T19:19:38.955285748Z   main.main.REDACTED.func17()
2024-08-09T19:19:38.955286753Z       /usr/src/https-terminator/middlewares/proxyhttp/REDACTED.go:24 +0x9cf
2024-08-09T19:19:38.955287988Z   main.main.REDACTED.func18()
2024-08-09T19:19:38.955289160Z       /usr/src/https-terminator/middlewares/proxyhttp/REDACTED.go:53 +0x2e1
2024-08-09T19:19:38.955290119Z   main.main.REDACTED.func19()
2024-08-09T19:19:38.955291243Z       /usr/src/https-terminator/middlewares/threats/REDACTED.go:23 +0x273
2024-08-09T19:19:38.955292407Z   main.main.REDACTED.func20()
2024-08-09T19:19:38.955295540Z       /usr/src/https-terminator/middlewares/proxyhttp/REDACTED.go:8 +0x19c
2024-08-09T19:19:38.955298336Z   main.main.REDACTED.func21()
2024-08-09T19:19:38.955299811Z       /usr/src/https-terminator/middlewares/proxyhttp/REDACTED.go:17 +0x116
2024-08-09T19:19:38.955301227Z   github.com/vercel/proxy/https-terminator/middlewares/proxyhttp.(*server).ServeHTTP()
2024-08-09T19:19:38.955302888Z       /usr/src/https-terminator/middlewares/proxyhttp/server.go:201 +0x571
2024-08-09T19:19:38.955305221Z   net/http.serverHandler.ServeHTTP()
2024-08-09T19:19:38.955306537Z       /usr/local/go/src/net/http/server.go:2943 +0x2a1
2024-08-09T19:19:38.955307491Z   net/http.initALPNRequest.ServeHTTP()
2024-08-09T19:19:38.955308475Z       /usr/local/go/src/net/http/server.go:3551 +0x35e
2024-08-09T19:19:38.955309699Z   net/http.(*initALPNRequest).ServeHTTP()
2024-08-09T19:19:38.955310705Z       <autogenerated>:1 +0x7b
2024-08-09T19:19:38.955311715Z   net/http.Handler.ServeHTTP-fm()
2024-08-09T19:19:38.955312916Z       <autogenerated>:1 +0x67
2024-08-09T19:19:38.955313934Z   golang.org/x/net/http2.(*serverConn).runHandler()
2024-08-09T19:19:38.955314966Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/server.go:2414 +0x25a
2024-08-09T19:19:38.955315980Z   golang.org/x/net/http2.(*serverConn).scheduleHandler.func1()
2024-08-09T19:19:38.955322658Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/server.go:2348 +0x5d
2024-08-09T19:19:38.955323910Z 
2024-08-09T19:19:38.955325457Z Previous read at 0x00c00bfa5740 by goroutine 7675260:
2024-08-09T19:19:38.955327005Z   runtime.mapiterinit()
2024-08-09T19:19:38.955328549Z       /usr/local/go/src/runtime/map.go:816 +0x0
2024-08-09T19:19:38.955329933Z   golang.org/x/net/http2.(*sorter).Keys()
2024-08-09T19:19:38.955331759Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/http2.go:344 +0x8f
2024-08-09T19:19:38.955333741Z   golang.org/x/net/http2.encodeHeaders()
2024-08-09T19:19:38.955334817Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/write.go:340 +0x13d
2024-08-09T19:19:38.955335846Z   golang.org/x/net/http2.(*writeResHeaders).writeFrame()
2024-08-09T19:19:38.955337101Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/write.go:217 +0x1cd
2024-08-09T19:19:38.955338182Z   golang.org/x/net/http2.(*serverConn).writeFrameAsync()
2024-08-09T19:19:38.955339244Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/server.go:890 +0x116
2024-08-09T19:19:38.955340382Z   golang.org/x/net/http2.(*serverConn).startFrameWrite.func2()
2024-08-09T19:19:38.955341415Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/server.go:1304 +0x6e
2024-08-09T19:19:38.955342347Z 
2024-08-09T19:19:38.955343398Z Goroutine 7675145 (running) created at:
2024-08-09T19:19:38.955356990Z   golang.org/x/net/http2.(*serverConn).scheduleHandler()
2024-08-09T19:19:38.955358243Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/server.go:2348 +0x28a
2024-08-09T19:19:38.955359222Z   golang.org/x/net/http2.(*serverConn).processHeaders()
2024-08-09T19:19:38.955360262Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/server.go:2070 +0xd30
2024-08-09T19:19:38.955361329Z   golang.org/x/net/http2.(*serverConn).processFrame()
2024-08-09T19:19:38.955362448Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/server.go:1571 +0x3f9
2024-08-09T19:19:38.955363452Z   golang.org/x/net/http2.(*serverConn).processFrameFromReader()
2024-08-09T19:19:38.955364574Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/server.go:1509 +0x324
2024-08-09T19:19:38.955365563Z   golang.org/x/net/http2.(*serverConn).serve()
2024-08-09T19:19:38.955366580Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/server.go:998 +0x17bc
2024-08-09T19:19:38.955367960Z   golang.org/x/net/http2.(*Server).serveConn()
2024-08-09T19:19:38.955369001Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/server.go:572 +0x1a90
2024-08-09T19:19:38.955370090Z   golang.org/x/net/http2.(*Server).ServeConn()
2024-08-09T19:19:38.955371192Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/server.go:436 +0x185
2024-08-09T19:19:38.955372317Z   golang.org/x/net/http2.ConfigureServer.func1()
2024-08-09T19:19:38.955373408Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/server.go:355 +0xbe
2024-08-09T19:19:38.955374353Z   net/http.(*conn).serve()
2024-08-09T19:19:38.955375307Z       /usr/local/go/src/net/http/server.go:1922 +0x1d41
2024-08-09T19:19:38.955376720Z   net/http.(*Server).Serve.func3()
2024-08-09T19:19:38.955377697Z       /usr/local/go/src/net/http/server.go:3091 +0x4f
2024-08-09T19:19:38.955378552Z 
2024-08-09T19:19:38.955379499Z Goroutine 7675260 (finished) created at:
2024-08-09T19:19:38.955380584Z   golang.org/x/net/http2.(*serverConn).startFrameWrite()
2024-08-09T19:19:38.955381555Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/server.go:1304 +0x77b
2024-08-09T19:19:38.955382677Z   golang.org/x/net/http2.(*serverConn).scheduleFrameWrite()
2024-08-09T19:19:38.955383614Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/server.go:1406 +0x3c8
2024-08-09T19:19:38.955384616Z   golang.org/x/net/http2.(*serverConn).writeFrame()
2024-08-09T19:19:38.955389980Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/server.go:1252 +0x3c7
2024-08-09T19:19:38.955391151Z   golang.org/x/net/http2.(*serverConn).serve()
2024-08-09T19:19:38.955392125Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/server.go:985 +0x15b1
2024-08-09T19:19:38.955393350Z   golang.org/x/net/http2.(*Server).serveConn()
2024-08-09T19:19:38.955394737Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/server.go:572 +0x1a90
2024-08-09T19:19:38.955396303Z   golang.org/x/net/http2.(*Server).ServeConn()
2024-08-09T19:19:38.955398538Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/server.go:436 +0x185
2024-08-09T19:19:38.955399815Z   golang.org/x/net/http2.ConfigureServer.func1()
2024-08-09T19:19:38.955401363Z       /go/pkg/mod/golang.org/x/net@v0.26.0/http2/server.go:355 +0xbe
2024-08-09T19:19:38.955402833Z   net/http.(*conn).serve()
2024-08-09T19:19:38.955404301Z       /usr/local/go/src/net/http/server.go:1922 +0x1d41
2024-08-09T19:19:38.955405880Z   net/http.(*Server).Serve.func3()
2024-08-09T19:19:38.955407377Z       /usr/local/go/src/net/http/server.go:3091 +0x4f
2024-08-09T19:19:38.955408869Z ==================
Timer commented 3 months ago

I may be missing it, but I don't see where WriteHeader blocks until the write is complete.

The http2 WriteHeader implementation (golang.org/x/net@v0.26.0/http2/server.go) uses the writeHeader impl in responseWriterState (:2943).

For 1xx status codes, it calls rws.conn.writeHeaders:

func (rws *responseWriterState) writeHeader(code int) {
    if rws.wroteHeader {
        return
    }

    checkWriteHeaderCode(code)

    // Handle informational headers
    if code >= 100 && code <= 199 {
        // Per RFC 8297 we must not clear the current header map
        h := rws.handlerHeader

        _, cl := h["Content-Length"]
        _, te := h["Transfer-Encoding"]
        if cl || te {
            h = h.Clone()
            h.Del("Content-Length")
            h.Del("Transfer-Encoding")
        }

        rws.conn.writeHeaders(rws.stream, &writeResHeaders{
            streamID:    rws.stream.id,
            httpResCode: code,
            h:           h,
            endStream:   rws.handlerDone && !rws.hasTrailers(),
        })

        return
    }

writeHeaders submits the call to writeFrameFromHandler (golang.org/x/net@v0.26.0/http2/server.go:2440). This function does not wait for any signals in the success case:

func (sc *serverConn) writeHeaders(st *stream, headerData *writeResHeaders) error {
    sc.serveG.checkNotOn() // NOT on
    var errc chan error
    if headerData.h != nil {
        // If there's a header map (which we don't own), so we have to block on
        // waiting for this frame to be written, so an http.Flush mid-handler
        // writes out the correct value of keys, before a handler later potentially
        // mutates it.
        errc = errChanPool.Get().(chan error)
    }
    if err := sc.writeFrameFromHandler(FrameWriteRequest{
        write:  headerData,
        stream: st,
        done:   errc,
    }); err != nil {
        return err
    }
    if errc != nil {
        select {
        case err := <-errc:
            errChanPool.Put(errc)
            return err
        case <-sc.doneServing:
            return errClientDisconnected
        case <-st.cw:
            return errStreamClosed
        }
    }
    return nil
}

And the writeFrameFromHandler impl simply drops it on the wantWriteFrameCh channel for writing (a buffered channel).

// writeFrameFromHandler sends wr to sc.wantWriteFrameCh, but aborts
// if the connection has gone away.
//
// This must not be run from the serve goroutine itself, else it might
// deadlock writing to sc.wantWriteFrameCh (which is only mildly
// buffered and is read by serve itself). If you're on the serve
// goroutine, call writeFrame instead.
func (sc *serverConn) writeFrameFromHandler(wr FrameWriteRequest) error {
    sc.serveG.checkNotOn() // NOT
    select {
    case sc.wantWriteFrameCh <- wr:
        return nil
    case <-sc.doneServing:
        // Serve loop is gone.
        // Client has closed their connection to the server.
        return errClientDisconnected
    }
}

Perhaps the header map sent in to rws.conn.writeHeaders needs to clone h since it all seems to happen async? Or am I missing what causes the blocking write?

Timer commented 3 months ago

cc @neild in case you unsubscribed from the issue after being closed 🙏, hope you don't mind.

neild commented 3 months ago

Thanks for the race detector output; that was very useful.

The problem is that while writeHeaders usually waits for the write to complete before returning, it stops waiting if the stream is reset by the peer. The race occurs when the header write has already started at the time the peer's reset is processed. (This is the <-st.cw path in writeHeaders.)

I'm still trying to figure out the right fix; there's a fairly complex set of interactions between the different goroutines involved in writing a frame. Most of this complexity is in service of efficiency, and I suspect the more obvious fixes leave us with all the complexity and none of the efficiency.

Timer commented 3 months ago

If it's of any anecdotal value, for latency/efficiency sake, we'd prefer WriteHeader to be async so we can continue processing the request without introducing a blocking operation.

Also, I did not catch that errc was a chan and thus responsible for the blocking operation. Makes sense!

In either case, the most trivial fix appears to be moving the h = h.Clone() out of the special condition that activates for CL & TE... but perhaps not the most performant.