golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.31k stars 17.7k forks source link

net: EPROTOTYPE surfaced from write() on macOS due to kernel bug #51538

Open tmm1 opened 2 years ago

tmm1 commented 2 years ago

What version of Go are you using (go version)?

$ go version
go version go1.17.6 darwin/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="auto"
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/tmm1/Library/Caches/go-build"
GOENV="/Users/tmm1/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/tmm1/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/tmm1/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.17.6/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.17.6/libexec/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.17.6"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/tmm1/fancybits/channels-server/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/_2/hljyy_zj3912lv9qqpy70t5w0000gn/T/go-build1772962119=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I have a http server running on macOS, which uses io.Copy() to stream data out to the http response writer.

What did you expect to see?

When the client closes the socket, I expect to see ECONNRESET or EPIPE

What did you see instead?

I sometimes see EPROTOTYPE instead:

2022/03/04 06:37:26.032031 write tcp 10.20.1.13:8089->10.20.1.17:53525: write: protocol wrong type for socket

2022/03/05 10:25:59.675548 write tcp 10.20.1.13:8089->10.20.1.17:49209: write: protocol wrong type for socket

2022/02/28 07:01:09.358647 write tcp 10.20.1.13:8089->10.20.1.17:50689: write: protocol wrong type for socket

This appears to be due to a well-documented macOS kernel bug: http://erickt.github.io/blog/2014/11/19/adventures-in-debugging-a-potential-osx-kernel-bug/

Many other runtimes have experienced this issue, and the suggested workaround is to retry the write operation.

See https://bugs.python.org/issue44229 and similar links on https://github.com/tokio-rs/mio/issues/1364 (for workarounds in libuv, dotnet, gevent)

mengzhuo commented 2 years ago

cc @neild

ianlancetaylor commented 2 years ago

Do you have a way that we can reliably reproduce the problem? Even if we have to run the test many times? That would help a great deal in knowing whether we have fixed it. Thanks.

tmm1 commented 2 years ago

I have observed this logged on customer installations and don't have a repro myself yet.

This particular customer is running on macOS 10.13.6

ianlancetaylor commented 2 years ago

Rolling forward to 1.20. Please comment if you disagree. Thanks.