golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.01k stars 17.54k forks source link

net/http: broken https connections reuse on arm64 #39114

Open DarthRamone opened 4 years ago

DarthRamone commented 4 years ago

What version of Go are you using (go version)?

$ go version
go1.14.3 linux/arm64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

Ubuntu 18.04 LTS ARM64

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/home/ubuntu/.cache/go-build"
GOENV="/home/ubuntu/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/ubuntu/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/snap/go/5770"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/snap/go/5770/pkg/tool/linux_arm64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build987108063=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I wrote telegram bot with one of go-based telegram framework, and tested it on my MacBook. Everything works just fine, until I deployed it to my Raspberry Pi 4. On rpi it works fine, until I do several calls to telegram API, after third call bot stops responding, because of it can't do request and gets EOF.

I've spent a week trying to debug and find an issue. I thought it was related with Russian telegram ban, but I've routed all the telegram traffic through VPN, I've secured my DNS as much as possible, and issue still reproduces. Same network, same codebase but different processor arch - works good, move to arm64 - everything is broken.

I do not know much how secure connections works at "low levels", and started to debug it with tcpdump and netstat, and noticed, that bot in idle state opens just one connection to API server for polling, but if I do any request - a second connect is appearing and stays open. I thought it was a problem, but then I googled a little, and figured out that it is totally fine and it's common practice to reuse opened connections. It was confirmed with another tests on MacBook - same second connection and everything works fine.

Okay, what if I'll try somehow not to reuse these connections? I've set response.Close flag to true - nothing happens. Then I tried to call CloseIdleConnections() after every request to API, and it works! No more EOFs. Then I've found another option - set DisableKeepAlives to false to http client transport, and now it works fine(without manual CloseIdleConnections and resp.Close true flag). But it is not good I think, it opens and closes connections on every request and doesn't reuse them.

What did you expect to see?

Correct reusing opened connections.

What did you see instead?

EOF

odeke-em commented 4 years ago

Thank you for filing this issue @DarthRamone and welcome to the Go project!

Could you perhaps please try to isolate a code snippet that illustrates the problem? Also could you check if the problem exists for Go1.13? You can get different Go versions using the tool dl

I shall also alert some experts about this issue, as we await a repro, cc @ianlancetaylor @cherrymui

DarthRamone commented 4 years ago

Well, I rebuilt container with golang:1.13.11 image and played a little, and it worked well, but after some steps I've got EOF anyway. Plus I realised, that I forgot to comment DisableKeepAlives=true, so this flag is not even workaround, I think. Only difference - it works much longer and require slightly more intense requests to get EOF.

So issue exists at 1.13 as well, and DisableKeepAlives=true just helps to stay alive little longer, but doesn't fix anything. So maybe it's not about reusing connections at all, but the truth is somewhere near.

Providing code snippet maybe difficult, because I'm not sure how to exactly reproduce it. Even in worst case, when DisableKeepAlives set to false by default, I'm getting EOF not on every third call, but on every third certain call to certain endpoint. It may looks like API problem, but for some reasons it reproduces only on my RPi. I'm not sure I can figure out all conditions that causes this behaviour.

odeke-em commented 4 years ago

Gotcha, thanks @DarthRamone! To seed a reproduction, please take a look at this short program which creates an HTTPS server serving HTTP/1.1 traffic at https://play.golang.org/p/TaaJx_MJ2ad or inlined below

package main

import (
    "crypto/tls"
    "net/http"
    "net/http/httptest"
    "net/http/httputil"
    "time"
)

func main() {
    cst := httptest.NewUnstartedServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        conn, _, err := w.(http.Hijacker).Hijack()
        if err != nil {
            panic(err)
        }
        tcpConn := conn.(*tls.Conn)
        if true {
            <-time.After(100 * time.Millisecond)
        }
        tcpConn.Write([]byte("HTTP/1.1 200 OK\r\nContent-Length: 2\r\n\r\nab\r\n\r\n"))
        tcpConn.Close()
    }))
    cst.StartTLS()
    defer cst.Close()

    println("Server address:", cst.URL)

    client := cst.Client()
    tr := client.Transport.(*http.Transport)
    tr.IdleConnTimeout = 50 * time.Millisecond
    for i := 0; i < 5; i++ {
        println("Request:", i, "*****")
        req, _ := http.NewRequest("GET", cst.URL, nil)
        res, err := client.Do(req)
        if err != nil {
            panic(err)
        }
        blob, err := httputil.DumpResponse(res, true)
        res.Body.Close()
        if err != nil {
            panic(err)
        }
        println(string(blob), "\n******\n\n")
    }
}

I intentionally have idleTimeouts and then invoke tcpConn.Close(), to kill/make some idle connections. Perhaps you can tweak this code to try to behave as your server is behaving? If you need to just get back an EOF, please comment the line that sends back the HTTP response in the server and you'll be served back a straight up EOF.

One other tip, is that if you know the Telegram server is serving HTTP/2 traffic, you can use this environment flag GODEBUG=http2debug=2 which will print out for you logs e.g.

GODEBUG=http2debug=2 go run main.go

Kindly /cc-ing to your radar @bradfitz @fraenkel

DarthRamone commented 4 years ago

I've changed your snipped a little to reproduce all what happens in bot library: https://play.golang.org/p/tk0YnYy8gBv But couldn't get EOF yet, only weird: "Unsolicited response received on idle HTTP channel starting with "\r\n\r\n"; err=nil" What's this btw?

I'll continue play with this snippet, maybe I will reproduce it somehow

UPD: Fast google says that this weird error(weird for me) is somehow related to idle connections, https and all current subjects. There are even couple issues in golang repo about it

bleggett commented 4 years ago

I believe this is related to https://github.com/golang/go/issues/5312 - which was never fixed, just closed.

I've seen similar EOF behavior in my own code, which makes parallel requests using goroutines - it appears there's a race condition where Go will incorrectly reuse a terminated connection before that connection can be properly marked as "terminated" by the runtime.

Even though you're not using goroutines, I suspect you're hitting some aspect of that race condition.

It seems to be something that can crop up more frequently in resource-constrained environments - see this comment for a hypothesis as to what's happening: https://github.com/golang/go/issues/5312#issuecomment-66078257