golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.7k stars 17.5k forks source link

net/http: slow io.Copy(http.ResponseWriter, *os.File) with Content-Length #41513

Open Ryan-A-B opened 3 years ago

Ryan-A-B commented 3 years ago

What version of Go are you using (go version)?

$ go version
go version go1.15.2 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build447228345=/tmp/go-build -gno-record-gcc-switches"

What did you do?

package http_test

import (
    "io"
    "io/ioutil"
    "log"
    "net/http"
    "net/http/httptest"
    "os"
    "strconv"
    "testing"
)

func runHandlerBenchmark(b *testing.B, handler http.Handler) {
    server := httptest.NewServer(handler)
    defer server.Close()

    client := server.Client()
    for i := 0; i < b.N; i++ {
        response, err := client.Get(server.URL)
        if err != nil {
            log.Fatal(err.Error())
        }
        _, err = ioutil.ReadAll(response.Body)
        if err != nil {
            log.Fatal(err.Error())
        }
    }
}

type HandlerFile struct {
    callback func(response http.ResponseWriter, file *os.File)
}

func (handler *HandlerFile) ServeHTTP(response http.ResponseWriter, request *http.Request) {
    file, err := os.Open("./main_test.go")
    if err != nil {
        log.Fatal(err.Error())
    }
    defer file.Close()
    handler.callback(response, file)
    _, err = io.Copy(response, file)
    if err != nil {
        log.Fatal(err.Error())
    }
}

func BenchmarkFileWithoutContentLength(b *testing.B) {
    doNothing := func(response http.ResponseWriter, file *os.File) {}
    handler := HandlerFile{
        callback: doNothing,
    }
    runHandlerBenchmark(b, &handler)
}

func BenchmarkFileWithContentLength(b *testing.B) {
    addContentLength := func(response http.ResponseWriter, file *os.File) {
        stat, err := file.Stat()
        if err != nil {
            log.Fatal(err.Error())
        }
        response.Header().Set("Content-Length", strconv.FormatInt(stat.Size(), 10))
    }
    handler := HandlerFile{
        callback: addContentLength,
    }
    runHandlerBenchmark(b, &handler)
}

What did you expect to see?

Both benchmarks to take the same amount of time

What did you see instead?

goos: linux
goarch: amd64
BenchmarkFileWithoutContentLength-8        12226             95274 ns/op
BenchmarkFileWithContentLength-8              20          52161420 ns/op
PASS
ianlancetaylor commented 3 years ago

Interesting test case. I'm not sure what is going on here. I can see that BenchmarkFileWithContentLength uses the sendfile system call while BenchmarkFileWithoutContentLength does not, but I don't know why it would lead to such a significant difference in benchmark time.

If I use a much larger file, rather than the main_test.go file, then BenchmarkFileWithContentLength is faster. In that case BenchmarkFileWithoutContentLength is going to be using more system calls.

Perhaps sendfile is not a great choice for short files, but I don't know why the difference would be so dramatic.

cuonglm commented 3 years ago

The problem is in chunkWriter.writeHeader, when Content-Length header is detected, chunkWriter.chunking won't be set. That causes response.ReadFrom use the net.Conn.ReadFrom instead of wrapping the response with writerOnly.

The fix can be enable chunking even in case Content-Length header was detected, but I'm not sure about it.

Any idea @bradfitz @odeke-em

darkLord19 commented 3 years ago

On macos I am getting both benchmarks close to each other.

goos: darwin
goarch: amd64
BenchmarkFileWithoutContentLength-8        11030        106178 ns/op
BenchmarkFileWithContentLength-8           10000        106606 ns/op
PASS
ianlancetaylor commented 3 years ago

@darkLord19 That's because macOS doesn't have sendfile.

@cuonglm Thanks for spotting where the behavior changes. I don't think it would be right to always enable chunking because as noted above using sendfile is actually faster when using a larger file.

I think we need to understand why sendfile is so much slower for short files.

CC @acln0 for any thoughts.

acln0 commented 3 years ago

When I tried to reproduce on my machine, I got this:

: forge:http-sendfile-bench $ go test -bench .
goos: linux
goarch: amd64
pkg: acln.ro/http-sendfile-bench
BenchmarkFileWithoutContentLength-4        16884         70872 ns/op
BenchmarkFileWithContentLength-4           18878         62345 ns/op
PASS
ok      acln.ro/http-sendfile-bench 3.743s

... which looks quite reasonable. I don't know what is going on here.

I have a theory: I notice that the benchmark calls client.Get, but never closes response.Body. It is my understanding that this can cause a file descriptor leak. Perhaps the first round of benchmarks leaves just enough file descriptors hanging around that by the time we get to the second one, the process is reaching a file descriptor limit, and the server enters this portion of its internal Accept loop: https://github.com/golang/go/blob/617f2c3e35cdc8483b950aa3ef18d92965d63197/src/net/http/server.go#L2962-L2976

The server sleeping for extended periods of time may account for the sendfile code path looking really really slow. I cannot imagine sendfile needing ~50ms to send a small file otherwise.

I have not been able to confirm my hypothesis, but I would kindly ask @Ryan-A-B to change the inner loop of the benchmark to this:

    for i := 0; i < b.N; i++ {
        response, err := client.Get(server.URL)
        if err != nil {
            log.Fatal(err.Error())
        }
        _, err = ioutil.ReadAll(response.Body)
        if err != nil {
            log.Fatal(err.Error())
        }
        response.Body.Close()
    }

and try again. Perhaps with this change, the benchmark would have more sensible results.

Ryan-A-B commented 3 years ago

@acln0 adding response.Body.Close() made no difference, however pointing at a smaller file did, it seems like files over a certain size experience the slowdown when stating Content-Length. I think that makes sense based on @cuonglm findings.

cuonglm commented 3 years ago

@darkLord19 That's because macOS doesn't have sendfile.

@cuonglm Thanks for spotting where the behavior changes. I don't think it would be right to always enable chunking because as noted above using sendfile is actually faster when using a larger file.

I think we need to understand why sendfile is so much slower for short files.

CC @acln0 for any thoughts.

I don't mean we always enable chunking, but still enable chunking when we detect there's Content-Length header. The current code which enable chunking is:

https://github.com/golang/go/blob/23573d0ea225d4b93ccd2b946b1de121c3a6cee5/src/net/http/server.go#L1448

Which happens if hasCL is false. But after those code, there's:

https://github.com/golang/go/blob/23573d0ea225d4b93ccd2b946b1de121c3a6cee5/src/net/http/server.go#L1464

which seems useless, since when chunking is only set if Content-Length header was not existed. So I propose:

diff --git a/src/net/http/server.go b/src/net/http/server.go
index 25fab288f2..909adeb67f 100644
--- a/src/net/http/server.go
+++ b/src/net/http/server.go
@@ -1429,17 +1429,17 @@ func (cw *chunkWriter) writeHeader(p []byte) {

        if w.req.Method == "HEAD" || !bodyAllowedForStatus(code) {
                // do nothing
-       } else if code == StatusNoContent {
+       } else if code == StatusNoContent || hasCL {
                delHeader("Transfer-Encoding")
-       } else if hasCL {
-               delHeader("Transfer-Encoding")
-       } else if w.req.ProtoAtLeast(1, 1) {
+       }
+
+       if w.req.ProtoAtLeast(1, 1) {
                // HTTP/1.1 or greater: Transfer-Encoding has been set to identity, and no
                // content-length has been provided. The connection must be closed after the
                // reply is written, and no chunking is to be done. This is the setup
                // recommended in the Server-Sent Events candidate recommendation 11,
                // section 8.
-               if hasTE && te == "identity" {
+               if !hasCL && hasTE && te == "identity" {
                        cw.chunking = false
                        w.closeAfterReply = true
                } else {

Note

The slow performance come from this line: https://github.com/golang/go/blob/23573d0ea225d4b93ccd2b946b1de121c3a6cee5/src/net/http/server.go#L616

ianlancetaylor commented 3 years ago

@Ryan-A-B Just to make sure we are on the same page, I want to clarify that what I see is that larger files are faster in BenchmarkFileWithContentLength. Smaller files are faster in BenchmarkFileWithoutContentLength.

ianlancetaylor commented 3 years ago

Running strace -ftt, it seems to me that it takes an unreasonably long time for epoll_pwait to return after sendfile is used to send data on the socket. Relevant lines from strace -ftt output:

[pid 1686887] 15:12:57.528572 sendfile(8, 9, NULL, 4194304 <unfinished ...>
[pid 1686887] 15:12:57.528647 <... sendfile resumed> ) = 1080
[pid 1686887] 15:12:57.529057 close(9 <unfinished ...>
[pid 1686887] 15:12:57.529131 <... close resumed> ) = 0
[pid 1686894] 15:12:57.529327 epoll_pwait(5,  <unfinished ...>
[pid 1686894] 15:12:57.736463 <... epoll_pwait resumed> [{EPOLLIN|EPOLLOUT, {u32=3680355512, u64=139847815513272}}], 128, -1, NULL, 0) = 1

Note the huge jump from 57.529327 to 57.736463 before epoll_pwait returns. During that interval the strace reports nothing of any interest. Why does it take epoll_pwait to long to return?

ianlancetaylor commented 3 years ago

My Linux kernel version is 5.7.17.

qingyunha commented 3 years ago

strace client and server separately, show there is a 200ms delay between sendfile and recvfrom.

07:22:16.141964 connect(3, {sa_family=AF_INET, sin_port=htons(8081), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
07:22:16.142104 poll([{fd=3, events=POLLOUT|POLLWRNORM}], 1, 0) = 1 ([{fd=3, revents=POLLOUT|POLLWRNORM}])
07:22:16.142381 sendto(3, "GET / HTTP/1.1\r\nHost: 127.0.0.1:"..., 78, MSG_NOSIGNAL, NULL, 0) = 78
07:22:16.142458 poll([{fd=3, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
07:22:16.142511 poll([{fd=3, events=POLLIN}], 1, 199) = 1 ([{fd=3, revents=POLLIN}])
07:22:16.142964 poll([{fd=3, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=3, revents=POLLIN|POLLRDNORM}])
07:22:16.143005 recvfrom(3, "HTTP/1.1 200 OK\r\nContent-Length:"..., 102400, 0, NULL, NULL) = 631
07:22:16.143215 poll([{fd=3, events=POLLIN}], 1, 198) = 0 (Timeout)
07:22:16.342150 poll([{fd=3, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
07:22:16.342346 poll([{fd=3, events=POLLIN}], 1, 0) = 0 (Timeout)
07:22:16.342487 poll([{fd=3, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
07:22:16.342583 poll([{fd=3, events=POLLIN}], 1, 1000) = 1 ([{fd=3, revents=POLLIN}])
07:22:16.348691 poll([{fd=3, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=3, revents=POLLIN|POLLRDNORM}])
07:22:16.348754 recvfrom(3, "llback func(response http.Respon"..., 959, 0, NULL, NULL) = 959
[pid  7073] 07:22:16.142734 write(4, "HTTP/1.1 200 OK\r\nContent-Length:"..., 631) = 631
[pid  7073] 07:22:16.142825 sendfile(4, 6, NULL, 4194304) = 959
[pid  7073] 07:22:16.142864 sendfile(4, 6, NULL, 4194304) = 0
mattn commented 3 years ago

FYI, I got this on my Windows PC.

goos: windows
goarch: amd64
pkg: github.com/mattn/http_bench
BenchmarkFileWithoutContentLength-4          679           1651679 ns/op
BenchmarkFileWithContentLength-4             727           1602470 ns/op
PASS
Ryan-A-B commented 3 years ago

@ianlancetaylor up to 512 bytes they're equal then the timing from 513 up BenchmarkFileWithContentLength goes from being the slow version to the fast version, crossing at about 1.2MB

256 byte file
BenchmarkFileWithoutContentLength-8        20000            104532 ns/op
BenchmarkFileWithContentLength-8           20000             90245 ns/op
512 byte file
BenchmarkFileWithoutContentLength-8        20000             94685 ns/op
BenchmarkFileWithContentLength-8           20000             86804 ns/op
513 byte file
BenchmarkFileWithoutContentLength-8        20000             96399 ns/op
BenchmarkFileWithContentLength-8              20          52140028 ns/op
1024 byte file
BenchmarkFileWithoutContentLength-8        20000             97876 ns/op
BenchmarkFileWithContentLength-8              20          52145628 ns/op
1.2MB file
BenchmarkFileWithoutContentLength-8         1000           1242620 ns/op
BenchmarkFileWithContentLength-8            2000           1389453 ns/op
3.6MB file
BenchmarkFileWithoutContentLength-8          500           2606823 ns/op
BenchmarkFileWithContentLength-8            1000           1653781 ns/op
qingyunha commented 3 years ago

@ianlancetaylor up to 512 bytes they're equal then the timing from 513 up BenchmarkFileWithContentLength goes from being the slow version to the fast version, crossing at about 1.2MB

Since the Content-Type detection sniff 512 bytes, so less 512 bytes will not invoke sendfile.

imxyb commented 3 years ago

my mac os benchmark:

goos: darwin
goarch: amd64
pkg: test/ut
BenchmarkFileWithoutContentLength-12           13285         88998 ns/op
BenchmarkFileWithContentLength-12              13695         89779 ns/op
PASS
gopherbot commented 3 years ago

Change https://golang.org/cl/305229 mentions this issue: net: use the size be sent as the count param of sendfile

harshavardhana commented 2 years ago

@darkLord19 That's because macOS doesn't have sendfile. @cuonglm Thanks for spotting where the behavior changes. I don't think it would be right to always enable chunking because as noted above using sendfile is actually faster when using a larger file. I think we need to understand why sendfile is so much slower for short files. CC @acln0 for any thoughts.

I don't mean we always enable chunking, but still enable chunking when we detect there's Content-Length header. The current code which enable chunking is:

https://github.com/golang/go/blob/23573d0ea225d4b93ccd2b946b1de121c3a6cee5/src/net/http/server.go#L1448

Which happens if hasCL is false. But after those code, there's:

https://github.com/golang/go/blob/23573d0ea225d4b93ccd2b946b1de121c3a6cee5/src/net/http/server.go#L1464

which seems useless, since when chunking is only set if Content-Length header was not existed. So I propose:

diff --git a/src/net/http/server.go b/src/net/http/server.go
index 25fab288f2..909adeb67f 100644
--- a/src/net/http/server.go
+++ b/src/net/http/server.go
@@ -1429,17 +1429,17 @@ func (cw *chunkWriter) writeHeader(p []byte) {

        if w.req.Method == "HEAD" || !bodyAllowedForStatus(code) {
                // do nothing
-       } else if code == StatusNoContent {
+       } else if code == StatusNoContent || hasCL {
                delHeader("Transfer-Encoding")
-       } else if hasCL {
-               delHeader("Transfer-Encoding")
-       } else if w.req.ProtoAtLeast(1, 1) {
+       }
+
+       if w.req.ProtoAtLeast(1, 1) {
                // HTTP/1.1 or greater: Transfer-Encoding has been set to identity, and no
                // content-length has been provided. The connection must be closed after the
                // reply is written, and no chunking is to be done. This is the setup
                // recommended in the Server-Sent Events candidate recommendation 11,
                // section 8.
-               if hasTE && te == "identity" {
+               if !hasCL && hasTE && te == "identity" {
                        cw.chunking = false
                        w.closeAfterReply = true
                } else {

Note

The slow performance come from this line:

https://github.com/golang/go/blob/23573d0ea225d4b93ccd2b946b1de121c3a6cee5/src/net/http/server.go#L616

I tried this - this breaks HTTP server for range GETs etc - I have seen similar behavior with epoll_wait under a combination of HEAD/GET requests where random 200ms delays spike up. All I can see by doing some extensive logging is that.

pconn.Read()

Regularly gets blocked, which seems to block and delay subsequent calls as well. This increases overall response times for calls - with unexpected latencies.

My OS is CentOS Linux release 8.3.2011

networkimprov commented 2 years ago

cc @neild

spade69 commented 1 year ago

hey guys , what is going on here? Problem solved?

AlexanderYastrebov commented 1 year ago

Rediscovered as https://github.com/golang/go/issues/61530