golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.62k stars 17.61k forks source link

net/http: source of errors is unclear #14203

Open owenthereal opened 8 years ago

owenthereal commented 8 years ago

What version of Go are you using (go version)?

1.5.3.

What operating system and processor architecture are you using?

Linux. AMD 64.

What did you do?

I'm using httputil.ReverseProxy to proxy to a host behind ELB, and I'm getting the errors "http: proxy error: net/http: request canceled" & "http: proxy error: net/http: transport closed before response was received" occationally. The line that printed the error is here. And the error came from http.Transport.RoundTrip.

After more digging, the errors could come from a couple places in http.Transport.RoundTrip. I was trying to find a way to print the callstacks of the error from http.Transport.RoundTrip to understand why it was returned but there's currently no way to do so.

Details can be found in this golang-nuts thread.

What did you expect to see?

I would expect I could instrument a method of either application code or Go's standard library and find out the callstacks at runtime.

What did you see instead?

No way to find out the callstack of a method from Go's stdlib. This is a feature request.

davecheney commented 8 years ago

@jingweno would it be easier to return a more descriptive error rather than invent a mechanism to instrument call stacks at runtime ?

randall77 commented 8 years ago

Can you run your code under a debugger? Setting a breakpoint at the offending error generating locations should get you the stack in question.

owenthereal commented 8 years ago

would it be easier to return a more descriptive error rather than invent a mechanism to instrument call stacks at runtime

It may be a yes in this case. But that would require waiting for the next Go version (if it's fixed), or I patch Go and create a customer build for myself. IMO a mechanism to instrument call stacks would be necessary.

Can you run your code under a debugger?

Unfortunately not. It's a critical app in production and the best we could do is to log it and fix it later.

bradfitz commented 8 years ago

http.RoundTripper passes stuff between goroutines. A single stack trace won't yield the answers you seek.

owenthereal commented 8 years ago

http.RoundTripper passes stuff between goroutines. A single stack trace won't yield the answers you seek.

I may not fully understand how goroutines work internally. But it's a matter of traceability, right? Can we not trace the calls from one goroutine to another?

bradfitz commented 8 years ago

@jingweno, the error is sent over a channel. It's unclear which stack(s) at which time you want. There is no automatic answer here short of recording a full trace of the program and inspecting it later, probably.

davecheney commented 8 years ago

I think this is not as complicated as you make out. As I understand it the problem is a generic error with the same text is returned from several places. If the error is made more specific, won't this solve the problem ?

On Wed, 3 Feb 2016, 10:34 Brad Fitzpatrick notifications@github.com wrote:

@jingweno https://github.com/jingweno, the error is sent over a channel. It's unclear which stack(s) at which time you want. There is no automatic answer here short of recording a full trace of the program and inspecting it later, probably.

— Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/14203#issuecomment-178885756.

bradfitz commented 8 years ago

To be clear, I agree with @davecheney. If there's an unstructured or useless error that needs to disambiguated, we can do that.

owenthereal commented 8 years ago

@bradfitz:

the error is sent over a channel. It's unclear which stack(s) at which time you want. There is no automatic answer here short of recording a full trace of the program and inspecting it later, probably.

Hmm...I may be oversimplifying the problem. These kinds of things happen for a distributed system as well: we keep track of callstacks for different microservices by carrying over a request id: https://brandur.org/request-ids. Can we do something similar for Goroutines? A unique "Goroutine ID" is generated before a piece of code is executed (the ID could come from an argument, as long as it's unique); The "Goroutine ID" is carried over for different goroutines; Stacks are correlated and aggregated by the "Goroutine ID". Maybe a deeper thought is needed though.

@davecheney:

I think this is not as complicated as you make out. As I understand it the problem is a generic error with the same text is returned from several places. If the error is made more specific, won't this solve the problem ?

Yup, it would solve the problem. But I couldn't get away of waiting for the next Go release or making a custom build for Go if I want to use it right away as mentioned in https://github.com/golang/go/issues/14203#issuecomment-178881150. I literally want to know what caused the errors now.

bradfitz commented 8 years ago

@jingweno, yes, we do the same thing with request tracing inside Google. See https://godoc.org/golang.org/x/net/trace and https://godoc.org/golang.org/x/net/context . Contexts need to be passed explicitly. Automatic contexts leads to tons of problems of confusingly-annotated traces (as witnessed in C++ and Java code trying to use TLS), which is why goroutines don't have accessible ID numbers. This won't be added to Go as an automatic thing.

I literally want to know what caused the errors now.

Modify the Go code and see. The error can only come from a few places.

bradfitz commented 8 years ago

I'm getting the errors "http: proxy error: net/http: request canceled" & "http: proxy error: net/http: transport closed before response was received"

The first is the when the client hung up on you. (the ReverseProxy CloseNotifier fired, and canceled the outbound request).

The second is when the peer server you're making the request hung up on you.

minux commented 8 years ago

Are those failed request POST request? ReverseProxy was using CloseNotifier wrong. See #13666.

Although I only observe the problem with some Go tip revisions before the fix, I still suggest you to try Go 1.6rc to see if it fixes the problem.

bradfitz commented 8 years ago

@minux, that bug was never shipped. That was only during the Go 1.6 dev cycle.

owenthereal commented 8 years ago

@bradfitz Alright, I'm coming back to this. Would it be possible to add different error messages for https://github.com/golang/go/blob/release-branch.go1.5/src/net/http/transport.go#L1093-L1094? I could help with that but I need to figure out how to contribute first...

Hopefully this could be fixed in time for 1.6.

owenthereal commented 8 years ago

And we're starting to get a couple "EOF" errors. The error type is *errors.errorString. The errors were from /usr/local/go/src/net/http/httputil/reverseproxy.go:194. I printed out the runtime stack, and the EOF may be coming from:

goroutine 7089108 [IO wait]:
net.runtime_pollWait(0x7f4174178f70, 0x72, 0xc82000a240)
/usr/local/go/src/runtime/netpoll.go:157 +0x60
net.(*pollDesc).Wait(0xc8207a2760, 0x72, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:73 +0x3a
net.(*pollDesc).WaitRead(0xc8207a2760, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:78 +0x36
net.(*netFD).Read(0xc8207a2700, 0xc8204cc000, 0x800, 0x800, 0x0, 0x7f41768e8050, 0xc82000a240)
/usr/local/go/src/net/fd_unix.go:232 +0x23a
net.(*conn).Read(0xc82002a3b8, 0xc8204cc000, 0x800, 0x800, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:172 +0xe4
crypto/tls.(*block).readFromUntil(0xc8228e88a0, 0x7f41768f2208, 0xc82002a3b8, 0x5, 0x0, 0x0)
/usr/local/go/src/crypto/tls/conn.go:455 +0xcc
crypto/tls.(*Conn).readRecord(0xc822c54b00, 0x8f4517, 0x0, 0x0)
/usr/local/go/src/crypto/tls/conn.go:540 +0x2d1
crypto/tls.(*Conn).Read(0xc822c54b00, 0xc820c8e000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/crypto/tls/conn.go:901 +0x167
net/http.noteEOFReader.Read(0x7f41768f2118, 0xc822c54b00, 0xc8209581b8, 0xc820c8e000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/http/transport.go:1370 +0x67
net/http.(*noteEOFReader).Read(0xc820d1b180, 0xc820c8e000, 0x1000, 0x1000, 0xc82002a688, 0x0, 0x0)
<autogenerated>:126 +0xd0
bufio.(*Reader).fill(0xc8228dac00)
/usr/local/go/src/bufio/bufio.go:97 +0x1e9
bufio.(*Reader).Peek(0xc8228dac00, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/bufio/bufio.go:132 +0xcc
net/http.(*persistConn).readLoop(0xc820958160)
/usr/local/go/src/net/http/transport.go:876 +0xf7
created by net/http.(*Transport).dialConn
/usr/local/go/src/net/http/transport.go:685 +0xc78
owenthereal commented 8 years ago

Also could @bradfitz explain a bit of b) of https://github.com/golang/go/blob/release-branch.go1.5/src/net/http/transport.go#L1194-L1199? We tracked down the downstream server did respond with 2xx but somehow the Proxy is responding with 5xx.

glasser commented 7 years ago

As described in #18272, it would also be nice if you could easily tell if an error returned by http.Transport.RoundTrip came from communicating with the target server or from reading request.Body (or a third place I'm not considering).

bradfitz commented 7 years ago

Related to #9424 / #15935

gopherbot commented 7 years ago

CL https://golang.org/cl/37495 mentions this issue.

bradfitz commented 7 years ago

/cc @tombergan

jerome-laforge commented 5 years ago

Hello all, Following the #23827, I want to know whether the error on http.Get("http://some.where") is produced before or after httpclient sent the request (e.g. during tcp dial, tls or write request body in socket). I have defined this function.

func AfterRequestSent(err error) bool {
    urlErr, ok := err.(*url.Error)
    if !ok {
        return true
    }

    opErr, ok := urlErr.Err.(*net.OpError)
    if !ok {
        return true
    }

    return opErr.Op != "dial" && opErr.Op != "write"
}

According to you, is it correct one?

Thx in adv.