golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.82k stars 17.65k forks source link

net: Conn hangs with SetReadDeadline #34385

Open harshavardhana opened 5 years ago

harshavardhana commented 5 years ago

What version of Go are you using (go version)?

$ go version
1.13

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/harsha/.cache/go-build"
GOENV="/home/harsha/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/harsha/mygo"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/harsha/.gimme/versions/go1.13.linux.amd64"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/harsha/.gimme/versions/go1.13.linux.amd64/pkg/tool/linux_amd64"
GCCGO="/usr/bin/gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/harsha/mygo/src/github.com/minio/minio/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build110074343=/tmp/go-build -gno-record-gcc-switches"

What did you do?

~https://play.golang.org/p/Xpw5O5EsXQv~ https://play.golang.org/p/C-Jdtg6Wr9M

$ NO_TIMEOUT=0 go run /tmp/http.go
<?php
$homepage = file_get_contents('http://localhost:1234/');
echo $homepage;
?>
~ for i in $(seq 1 5); do /usr/bin/time -f "%E" php7.2 http.php 1>/dev/null; done
0:05.04
0:00.06
0:05.02
0:05.05
0:05.04

What did you expect to see?

No hangs with following code

$ NO_TIMEOUT=1 go run /tmp/http.go
<?php
$homepage = file_get_contents('http://localhost:1234/');
echo $homepage;
?>
~ for i in $(seq 1 5); do /usr/bin/time -f "%E" php7.2 http.php 1>/dev/null; done
0:00.03
0:00.01
0:00.01
0:00.01
0:00.01

What did you see instead?

Hangs up to the SetReadDeadline timeout

$ NO_TIMEOUT=0 go run /tmp/http.go
<?php
$homepage = file_get_contents('http://localhost:1234/');
echo $homepage;
?>
~ for i in $(seq 1 5); do /usr/bin/time -f "%E" php7.2 http.php 1>/dev/null; done
0:05.04
0:00.06
0:05.02
0:05.05
0:05.04

This is another variant of the same issue https://github.com/golang/go/issues/21133 and the work-around presented works still.

import (
    "net"
    "sync/atomic"
    "time"
)

// QuirkConn - similar to golang net.Conn struct, but contains a workaround of the
// following the go bug reported here https://github.com/golang/go/issues/21133.
// Once the bug will be fixed, we can remove this structure and replaces it with
// the standard net.Conn
type QuirkConn struct {
    net.Conn
    hadReadDeadlineInPast int32 // atomic
}

// SetReadDeadline - implements a workaround of SetReadDeadline go bug
func (q *QuirkConn) SetReadDeadline(t time.Time) error {
    inPast := int32(0)
    if t.Before(time.Now()) {
        inPast = 1
    }
    atomic.StoreInt32(&q.hadReadDeadlineInPast, inPast)
    return q.Conn.SetReadDeadline(t)
}

// canSetReadDeadline - returns if it is safe to set a new
// read deadline without triggering golang/go#21133 issue.
func (q *QuirkConn) canSetReadDeadline() bool {
    return atomic.LoadInt32(&q.hadReadDeadlineInPast) != 1
}

The question I have is why this workaround is needed at all. Is it not expected to SetReadDeadline() right before the read operation? but why does it work some times? It also looks like there may be some performance implications on doing something like this.

ianlancetaylor commented 5 years ago

The program you showed us doesn't look at NO_TIMEOUT at all. Are you showing us the exact code that you are testing?

harshavardhana commented 5 years ago

@ianlancetaylor my bad here is the right example

~https://play.golang.org/p/Xpw5O5EsXQv~ https://play.golang.org/p/C-Jdtg6Wr9M

harshavardhana commented 5 years ago

Had to update the example, updated the right link.

av86743 commented 5 years ago

If c.SetReadDeadline(time.Now().Add(c.readTimeout)) in your example is commented out, deadlock still occurs. Consider resolving the problem with your customization of net.Listener and/or net.Conn first.

harshavardhana commented 5 years ago

If c.SetReadDeadline(time.Now().Add(c.readTimeout)) in your example is commented out, deadlock still occurs. Consider resolving the problem with your customization of net.Listener and/or net.Conn first.

@av86743 I don't think so it does here is how I tested the shared code.

package main

import (
    "log"
    "net"
    "net/http"
    "os"
    "time"
)

// timeoutableConn
type timeoutableConn struct {
    readTimeout time.Duration
    net.Conn
}

func (c *timeoutableConn) Read(b []byte) (n int, err error) {
    if c.readTimeout > 0 {
        //c.SetReadDeadline(time.Now().Add(c.readTimeout))
    }
    n, err = c.Conn.Read(b)
    return n, err
}

// customListener returns timeoutable net.Conn
type customListener struct {
    net.Listener
}

func (l *customListener) Accept() (conn net.Conn, err error) {
    c, e := l.Listener.Accept()
    if e != nil {
        return nil, e
    }
    return &timeoutableConn{Conn: c, readTimeout: 5 * time.Second}, nil
}

func main() {
    var l net.Listener
    l, err := net.Listen("tcp", ":1234")
    if err != nil {
        log.Fatal(err)
    }

    if os.Getenv("NO_TIMEOUT") != "1" {
        l = &customListener{l}
    }

    _ = http.Serve(l, http.HandlerFunc(
        func(w http.ResponseWriter, req *http.Request) {
            buf := []byte(`127.0.0.1    localhost
127.0.1.1   backspace

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

172.31.59.24    ceph-1
172.31.57.119    ceph-2
172.31.57.137    ceph-3
172.31.63.24    ceph-4
172.31.63.204    ceph-5
172.31.51.188    ceph-6
172.31.54.140    ceph-7
172.31.49.145    ceph-8

`)
            w.Write(buf)
            w.(http.Flusher).Flush()
        }))
}
~ go run /tmp/t.go
~ for i in $(seq 1 10); do /usr/bin/time -f "%E" php7.2 http.php 1>/dev/null; done
0:00.03
0:00.01
0:00.01
0:00.01
0:00.01
0:00.01
0:00.01
0:00.01
0:00.01
0:00.01

So no hangs if that code is commented out. Add the code back.

~ for i in $(seq 1 10); do /usr/bin/time -f "%E" php7.2 http.php 1>/dev/null; done
0:05.02
0:00.04
0:00.01
0:05.01
0:05.04
0:05.01
0:00.01
0:00.01
0:05.01
0:05.01

So there is no issue in the code itself, I am not sure how are you testing it.

av86743 commented 5 years ago

... I am not sure how are you testing it.

Run on playground.

harshavardhana commented 5 years ago

Run on playground.

Playground doesn't run HTTP service @av86743 its a nacl container :-) any long-running go-routines such as HTTP service are rejected.

That is not how you would reproduce this issue IMHO, run it locally on your laptop.

av86743 commented 5 years ago

Playground doesn't run HTTP service etc.

First: Why put your example in playground if it is not runnable from there?

Second: An original example from #21133 with your setting of port :1234 is runnable from playground all right.

Third: #21133 was closed without any intention to fix anything in net/http. Why do you want to present it differently as:

// QuirkConn - similar to golang net.Conn struct, but contains a workaround of the
// following the go bug reported here https://github.com/golang/go/issues/21133.
// Once the bug will be fixed, we can remove this structure and replaces it with
// the standard net.Conn
harshavardhana commented 5 years ago

@av86743 because original was never addressed, what I wanted to shed light on is that the workaround provided was also not correct because.

It basically disabled SetReadDeadline() altogether and that exactly what shouldn't happen. Setting deadlines repeatedly shouldn't cause hangs in the first place and the workaround is also wrong because it basically just disabled the SetReadDeadline() code.

First: Why put your example in playground if it is not runnable from there?

This is an odd argument, the intention is to show code how to reproduce it. This issue also talks about how to reproduce it using the command line using PHP code. I don't know what made you think that you can run in the playground and also run PHP code to reproduce this.

Second: An original example from #21133 with your setting of port :1234 is runnable from playground all right.

The original example added client code, which I didn't want to add because you need to use PHP to reproduce this issue not Go HTTP client. Please follow what I pasted above on how to reproduce it.

av86743 commented 5 years ago

Second: An original example from #21133 with your setting of port :1234 is runnable from playground all right.

This is not a question and as such, it does not require your response. This is statement of a fact contrary to what you have earlier stated (I quote):

Playground doesn't run HTTP service @av86743 its a nacl container :-) any long-running go-routines such as HTTP service are rejected.


... This issue also talks about how to reproduce it using the command line using PHP code. ... you need to use PHP to reproduce this issue not Go HTTP client. ...

If you are not able to reproduce whatever it is without involving PHP, why do you claim that problem is on the go side and not on the PHP side?

networkimprov commented 5 years ago

@harshavardhana It's documented in #21133 why your code doesn't work, and what the workaround is. If the workaround has problems, you should open a new issue with code that demonstrates them.

Note curl is a better test client; not that many folks have PHP installed.

P.S. you don't have to respond to every comment you get on an issue :-)

harshavardhana commented 5 years ago

If you are not able to reproduce whatever it is without involving PHP, why do you claim that problem is on the go side and not on the PHP side?

Right now its just PHP - I haven't explored enough to show you if there are other clients out there which reproduce this issue - the reason is that without deadlines the code works fine. That makes me wonder that the issue is in Go - either deadlines don't work the way we want or we are using it wrong. Some explanation that would help in the documentation.

@harshavardhana It's documented in #21133 why your code doesn't work, and what the workaround is. If the workaround has problems, you should open a new issue with code that demonstrates them.

Fundamental reasoning is why does one need a workaround in the first place, the issue is closed as if the bug is in our custom net.Conn implementation which doesn't seem to be the case unless our usage of deadlines are dead wrong - if that is the case I am happy to change the implementation instead of applying a workaround. So any guidance on that area would be much appreciated.

networkimprov commented 5 years ago

EDIT: Please re-read https://github.com/golang/go/issues/21133#issuecomment-317265922.

av86743 commented 5 years ago

... without deadlines the code works fine. ... either deadlines don't work the way we want or we are using it wrong. ...

When read deadline is not set in customizedConn.Read, current implementation of net/http uses very short implicit deadline and time-outs almost instantly. What you call "code works fine" is actually "problems are not given sufficient time to manifest".

harshavardhana commented 5 years ago

When read deadline is not set in customizedConn.Read, current implementation of net/http uses very short implicit deadline and time-outs almost instantly. What you call "code works fine" is actually "problems are not given sufficient time to manifest".

Okay, understood @av86743 - with that analogy it would certainly also mean that writing custom wrappers for net.Conn with deadlines is generally not advisable and can lead to issues - also come in the way.

Would it be possible to document such a thing? Because we shot ourselves in the foot without realizing this would come in the way as per net.Conn documentation.

harshavardhana commented 2 years ago

Is there an intent to document this on why writing custom net.Conn's with deadlines won't work in reality with net/http?