Closed tehmoon closed 1 year ago
Thank you for reporting this issue @tehmoon and for the analysis and welcome to the Go project!
To follow along with your findings here is a repro that if given a SIGQUIT after say 3 seconds produces a stacktrace very similar to yours. However, even if I wait until the headers have been sent by the client
package main
import (
"context"
"io"
"io/ioutil"
"log"
"net/http"
"net/http/httptest"
"net/http/httptrace"
"sync"
"time"
)
func main() {
log.SetFlags(0)
cst := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
blob, _ := ioutil.ReadAll(io.LimitReader(r.Body, 100))
log.Printf("Server: %s", blob)
w.Write([]byte("Hello, world!"))
w.(http.Flusher).Flush()
r.Body.Close()
}))
defer cst.Close()
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
prc, pwc := io.Pipe()
var triggerSendHeadersOnce sync.Once
sendHeaders := make(chan bool)
go func() {
defer pwc.Close()
<-sendHeaders
for i := 0; i < 1000; i++ {
select {
case <-time.After(450 * time.Millisecond):
pwc.Write([]byte("Hello from client"))
case <-ctx.Done():
return
}
}
}()
trace := &httptrace.ClientTrace{
WroteHeaders: func() {
triggerSendHeadersOnce.Do(func() {
log.Println("Closing sendHeaders")
close(sendHeaders)
})
},
WroteHeaderField: func(key string, value []string) {
log.Printf("Header %q: %#v\n", key, value)
},
WroteRequest: func(ri httptrace.WroteRequestInfo) {
log.Printf("WroteRequestInfo: %#v", ri)
},
}
req, _ := http.NewRequest("POST", cst.URL, prc)
req = req.WithContext(httptrace.WithClientTrace(req.Context(), trace))
res, err := http.DefaultClient.Do(req)
if err != nil {
log.Fatalf("Failed to make request: %v", err)
}
blob, _ := ioutil.ReadAll(res.Body)
_ = res.Body.Close()
log.Printf("Client: %s", blob)
}
once you see it having sent the headers and some bit of the body
Header "Host": []string{"127.0.0.1:64605"}
Header "User-Agent": []string{"Go-http-client/1.1"}
Header "Transfer-Encoding": []string{"chunked"}
Header "Accept-Encoding": []string{"gzip"}
Closing sendHeaders
Server: Hello from clientHello from clientHello from clientHello from clientHello from clientHello from clie
which seems to me like in deed, on slow reads something funky is going on with our transport.
So in May 2019 (2 months ago), I encountered a similar report in https://github.com/googleapis/google-cloud-go/issues/1380 and from that other report, I came up with a hypothesis that perhaps the server is overloaded and doesn't send back any response for a long time and I made a repro to reproduce very similar stacktraces as per a repro https://gist.github.com/odeke-em/b62737b89b91e71ffbf0545581976cbc or https://play.golang.org/p/5b_d6RjikGN or inlined below
package main
import (
"bytes"
"context"
"crypto/tls"
"fmt"
"io"
"io/ioutil"
"log"
"mime/multipart"
"net/http"
"net/http/httptest"
"net/textproto"
"net/url"
"os"
"syscall"
"time"
"golang.org/x/net/http2"
)
func main() {
cst := httptest.NewUnstartedServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Stalled server that takes forever to read or could be overloaded.
// It already established a connection though.
<-time.After(10 * time.Minute)
slurp, _ := ioutil.ReadAll(r.Body)
log.Printf("Request payload: %s\n", slurp)
w.Write(bytes.Repeat([]byte("a"), 3000))
}))
if err := http2.ConfigureServer(cst.Config, nil); err != nil {
log.Fatalf("http2.ConfigureServer: %v", err)
}
cst.StartTLS()
defer cst.Close()
tlsConfig := &tls.Config{InsecureSkipVerify: true}
u, _ := url.Parse(cst.URL)
tlsConn, err := tls.Dial("tcp", u.Host, tlsConfig)
if err != nil {
log.Fatalf("Failed to create a tls connection: %v", err)
}
prc, pwc := io.Pipe()
go func() {
h := make(textproto.MIMEHeader)
mpw := multipart.NewWriter(pwc)
w, err := mpw.CreatePart(h)
if err != nil {
mpw.Close()
pwc.CloseWithError(fmt.Errorf("CreatePart failed: %v", err))
return
}
n, _ := pwc.Write(bytes.Repeat([]byte("a"), 39<<20))
println("read ", n)
r := bytes.NewReader([]byte(`{"id": "1380", "type": "issue"}`))
if _, err := io.Copy(w, r); err != nil {
mpw.Close()
pwc.CloseWithError(fmt.Errorf("Copy failed: %v", err))
return
}
println("done read in goroutine")
mpw.Close()
pwc.Close()
}()
tr := &http2.Transport{TLSClientConfig: tlsConfig}
cc, err := tr.NewClientConn(tlsConn)
if err != nil {
log.Fatalf("(*http2.Transport).NewClientConn: %v", err)
}
// Find our own process and in the background send ourselves SIGQUIT.
selfProcess, err := os.FindProcess(os.Getpid())
if err != nil {
log.Fatalf("Failed to find own process: %v", err)
}
go func() {
<-time.After(6 * time.Second)
if err := selfProcess.Signal(syscall.SIGQUIT); err != nil {
log.Fatalf("Failed to send self SIGQUIT: %v", err)
}
}()
// Send that ping frame and ensure we have an established connection
// and that the server is one stalled and body reads are stalled.
if err := cc.Ping(context.Background()); err != nil {
log.Fatalf("(*http2.ClientConn).Ping: %v", err)
}
req, _ := http.NewRequest("GET", cst.URL, prc)
res, err := cc.RoundTrip(req)
if err != nil {
log.Fatalf("http.Transport.Roundtrip error: %v", err)
}
defer res.Body.Close()
blob, _ := ioutil.ReadAll(res.Body)
log.Printf("%s\n", blob)
}
which produced the curious stacktrace
goroutine 35 [select]:
io.(*pipe).Write(0xc00015cd20, 0xc0001d0090, 0x42, 0x82, 0x0, 0x0, 0x0)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/io/pipe.go:87 +0x1dc
io.(*PipeWriter).Write(0xc00013a028, 0xc0001d0090, 0x42, 0x82, 0x12f8720, 0x13465a0, 0x10cf201)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/io/pipe.go:153 +0x4c
bytes.(*Buffer).WriteTo(0xc0000ba240, 0x13bd460, 0xc00013a028, 0x26000d0, 0xc0000ba240, 0x1)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/bytes/buffer.go:242 +0xb5
io.copyBuffer(0x13bd460, 0xc00013a028, 0x13bd220, 0xc0000ba240, 0x0, 0x0, 0x0, 0x2, 0x0, 0x0)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/io/io.go:384 +0x33f
io.Copy(...)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/io/io.go:364
mime/multipart.(*Writer).CreatePart(0xc0000ba210, 0xc00005fe48, 0x0, 0x0, 0x0, 0x0)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/mime/multipart/writer.go:121 +0x3fa
main.main.func2(0xc00013a028)
/Users/emmanuelodeke/Desktop/openSrc/bugs/google-cloud-go/1380/main.go:48 +0x11f
created by main.main
/Users/emmanuelodeke/Desktop/openSrc/bugs/google-cloud-go/1380/main.go:45 +0x3cd
I am posting this here to absolve the google-cloud-go package issue that I encountered and to continue the investigation here.
Thank you for following up @odeke-em ! This looks interesting, because it looks like you have the same logic. I've tried to add some kind of synchronization just right before the first write, but what I thought was working was not.
Actually it worked, but not sure it is bug free:
I don't think it is bug free because the waitgroup is done before the write can happen (otherwise write becomes blocking until the body is read by request).
But somehow it works well enough 🤷♂ . I tried to maximized the race condition chances by doing go wg.Done()
then the write. Hopeful it will be blocking on writing before the go routine is running.
goroutine 38 [chan receive]:
net/http.finishAsyncByteRead.Read(0xc000162000, 0xc000170000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/net/http/transfer.go:1030 +0x60
io.(*multiReader).Read(0xc000164020, 0xc000170000, 0x8000, 0x8000, 0x3, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/io/multi.go:26 +0xaf
net/http.transferBodyReader.Read(0xc000162000, 0xc000170000, 0x8000, 0x8000, 0x1, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/net/http/transfer.go:62 +0x56
io.copyBuffer(0x2444040, 0xc000166020, 0x12bf9e0, 0xc000162000, 0xc000170000, 0x8000, 0x8000, 0x123aae0, 0x1456a00, 0x2444040)
/usr/local/Cellar/go/1.11.2/libexec/src/io/io.go:402 +0x125
io.Copy(0x2444040, 0xc000166020, 0x12bf9e0, 0xc000162000, 0xc000166020, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/io/io.go:364 +0x5a
net/http.(*transferWriter).writeBody(0xc000162000, 0x12bed40, 0xc000134040, 0x2, 0x2)
/usr/local/Cellar/go/1.11.2/libexec/src/net/http/transfer.go:355 +0x155
net/http.(*Request).write(0xc0000fe000, 0x12bed40, 0xc000134040, 0x0, 0xc000138030, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/net/http/request.go:645 +0x6e8
net/http.(*persistConn).writeLoop(0xc000110000)
/usr/local/Cellar/go/1.11.2/libexec/src/net/http/transport.go:1888 +0x1b8
created by net/http.(*Transport).dialConn
/usr/local/Cellar/go/1.11.2/libexec/src/net/http/transport.go:1339 +0x966
The blocking reason may be net/http.finishAsyncByteRead.Read
never return EOF, and it's in a io.MultiReader
.
t.Body = io.MultiReader(finishAsyncByteRead{t}, t.Body)
func (fr finishAsyncByteRead) Read(p []byte) (n int, err error) {
if len(p) == 0 {
return
}
rres := <-fr.tw.ByteReadCh
n, err = rres.n, rres.err
if n == 1 {
p[0] = rres.b
}
return
}
And the following change should solve this problem.
diff --git a/src/net/http/transfer.go b/src/net/http/transfer.go
index 2e01a07..8a306ca 100644
--- a/src/net/http/transfer.go
+++ b/src/net/http/transfer.go
@@ -212,6 +212,7 @@ func (t *transferWriter) probeRequestBody() {
rres.b = buf[0]
}
t.ByteReadCh <- rres
+ close(t.ByteReadCh)
}(t.Body)
timer := time.NewTimer(200 * time.Millisecond)
select {
@@ -1076,6 +1077,9 @@ func (fr finishAsyncByteRead) Read(p []byte) (n int, err error) {
if n == 1 {
p[0] = rres.b
}
+ if n == 0 {
+ err = io.EOF
+ }
return
}
@odeke-em Are you working on this particular issue?
I posted a way to reproduce this on a separate issue but the tldr that I can see is that it occurs when GetBody
is called on the result returned by MediaInfo.UploadRequest
. In my testing, this most often occurs when the initial POST request fails due to an error (like http2ErrNoCachedConn
)
Based on some instrumentation I had added during testing, the body returned by GetBody
isn't always closed which is what leaves the request hanging.
I think this is fixed via https://go.dev/cl/340256
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
I just tried with
1.11.3
latest brew and still happeningWhat operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Compiled and ran the following program: https://play.golang.org/p/zWXnswPCqmm This is an HTTP client, which can be tested against a server:
What did you expect to see?
If I run the program then wait 1-2 seconds, and start typing lines in my terminal, I'm expecting those same line to be sent to the server in http transport format.
What did you see instead?
What I saw was that if I do wait couple seconds before typing, the http client hangs and nothing is sent to the server. I did some tracing with tcpdump to confirm that no packet were sent to the server. On the other end, if I do start typing before the web client sends the http header -- meaning that you usually have the compiling time with go run to fill out stdin with some crap -- the web client does not hang and sends everything correctly.
My own workaround is to make sure data is written to the
*io.Writer
part before invokinghttp.Client.Do(req)
but I have no idea why this is happening. Perhaps it's a bug, or perhaps it's a feature.Digging through the stack trace, I stumbled upon a comment:
Which brought me to the issue and @bradfitz talking about
io.Pipe
. However I did test withreq.ContentLength = 1
and it did not do the work.It seems to hit this part of the code:
The
case <- timer.C
does not make sense to me, I don't know the library really deep and I am trying to understand, but my sense is that it feels like a bug.Perhaps it is not, but I would be really grateful if somebody explains why this is happening and more importantly, if it is the expected behavior or not.
Much appreciate any response and sorry for the time wasted if that had already be solved. I did spend some amount of time doing prior research and PoC before posting this.