golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.72k stars 17.62k forks source link

x/net/http2: Server is slow under load (flow control?) #18404

Closed erin-noe-payne closed 7 years ago

erin-noe-payne commented 7 years ago

What version of Go are you using (go version)?

go version go1.7.4 darwin/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/erin.noe-payne/local/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"

What did you do?

The server code:

package main

import (
    "io/ioutil"
    "log"
    "net/http"
)

type handler struct{}

func (*handler) ServeHTTP(res http.ResponseWriter, req *http.Request) {
    ioutil.ReadAll(req.Body)
    req.Body.Close()
}

func main() {
    server := &http.Server{
        Addr:    ":3000",
        Handler: &handler{},
        // Uncomment to disable http/2
        // TLSNextProto: make(map[string]func(*http.Server, *tls.Conn, http.Handler)),
    }

    log.Fatal(server.ListenAndServeTLS("tls/server.crt", "tls/server.key"))
}

Create a data file for testing: $ mkfile -n 5m 5mb.txt

Using the vegeta load test client: $ echo PUT https://localhost:3000/ | vegeta attack -insecure -duration=10s -rate=25 -body=5mb.txt | vegeta report

Run the same test against the server running in http/1.1 and http/2 mode.

What did you expect to see?

I would expect the performance (average latency / throughput) of an http/2 server to be similar or better than http/1.1.

What did you see instead?

HTTP/2 results:

Making 25 req/s over 10s
Requests      [total, rate]            250, 25.10
Duration      [total, attack, wait]    42.305466622s, 9.959999799s, 32.345466823s
Latencies     [mean, 50, 95, 99, max]  31.880917546s, 33.618733832s, 37.184070581s, 37.822213047s, 38.701059773s
Bytes In      [total, mean]            0, 0.00
Bytes Out     [total, mean]            1310720000, 5242880.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:250

http-2 HTTP/1.1 results:

Making 25 req/s over 10s
Requests      [total, rate]            250, 25.10
Duration      [total, attack, wait]    10.044000447s, 9.959999916s, 84.000531ms
Latencies     [mean, 50, 95, 99, max]  105.904846ms, 81.203795ms, 233.098998ms, 315.526192ms, 504.860265ms
Bytes In      [total, mean]            0, 0.00
Bytes Out     [total, mean]            1310720000, 5242880.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:250

http-1 1

Further info:

Conclusion

Basically, HTTP/2 is merged into core and enabled by default on TLS servers. This leads me to expect that an HTTP/2 server should perform similarly or better than an HTTP/1.1 server under most conditions. I wish to build performant REST APIs that can sustain reasonable throughput, and HTTP/2 offers a number of features that I had assumed would improve performance.

I'm not clear if this is a bad assumption, I am misusing the APIs somehow, or this reflects an issue with the implementation. Any support, information, or recommendations would be greatly appreciated. Thanks!

bradfitz commented 7 years ago

This is almost certainly #16512

I think vegeta is starved for flow control tokens.

/cc @tombergan

tombergan commented 7 years ago

Also see #17985

tombergan commented 7 years ago

@bradfitz, I think this is more than #16512. Why are there 2 conn.serve goroutines but 165212 conn.writeFrameAsync goroutines? There should be at most one writeFrameAsync goroutine per serve goroutine. There is either a bug that triggers multiple concurrent frame writes, or a goroutine leak that doesn't cleanup writeFrameAsync goroutines. The latter seems likely. Probably this should wait for "wroteFrameCh<-" or doneServing? https://github.com/golang/net/blob/45e771701b814666a7eb299e6c7a57d0b1799e91/http2/server.go#L640

erin-noe-payne commented 7 years ago

Hey guys, thanks for the quick response. This discussion leads to some follow-up questions:

bradfitz commented 7 years ago

@autoric, yup, those are the questions this bug is about investigating. We'll probably need better (or automatic) defaults, and more knobs.

yonderblue commented 7 years ago

I'm always a fan of the no knob solution, but if there could be knobs only in the http2 pkg offered earlier than the auto solution it would be great for us.

gopherbot commented 7 years ago

CL https://golang.org/cl/35118 mentions this issue.

Scratch-net commented 7 years ago

Maybe it's somehow related to #18309

gopherbot commented 7 years ago

CL https://golang.org/cl/37226 mentions this issue.

gopherbot commented 7 years ago

CL https://golang.org/cl/37500 mentions this issue.

tombergan commented 7 years ago

@autoric, do the above two changes fix the problem? I'm a bit concerned about the number of goroutines (see prior comment) but the flow control problem should be fixed.

erin-noe-payne commented 7 years ago

@tombergan Thanks for the update! I will run tests in the next couple days and get back to you.

bradfitz commented 7 years ago

@autoric, any updates? Please try Go master before it becomes Go 1.9beta1 soonish here.

bradfitz commented 7 years ago

Timeout. I assume this was fixed. Comment if not.