Open rnapier opened 9 years ago
Are you sure you're on master? The block you mention was mostly removed recently for this reason. I'm just about to make a change that should also help.
I'm back a few commits from HEAD. I see baf750dfc0e85d3d73672dbef6ff735f010daf5a, which should help, too. I've got tests cases for my full system (not just the simplified version posted above), so I'll let you know if it fixes it.
Still seeing data races and lockups with 7e7ce93:
==================
WARNING: DATA RACE
Write by goroutine 13:
github.com/SlyMarbo/spdy/spdy3.(*Conn).shutdown()
/Users/rnapier/work/agent/src/github.com/SlyMarbo/spdy/spdy3/shutdown.go:100 +0xa59
github.com/SlyMarbo/spdy/spdy3.(*Conn).(github.com/SlyMarbo/spdy/spdy3.shutdown)-fm()
/Users/rnapier/work/agent/src/github.com/SlyMarbo/spdy/spdy3/shutdown.go:13 +0x2d
sync.(*Once).Do()
/usr/local/Cellar/go/1.5/libexec/src/sync/once.go:44 +0xf6
github.com/SlyMarbo/spdy/spdy3.(*Conn).Close()
/Users/rnapier/work/agent/src/github.com/SlyMarbo/spdy/spdy3/shutdown.go:13 +0x8d
github.com/SlyMarbo/spdy/spdy3.(*Conn).handleReadWriteError()
/Users/rnapier/work/agent/src/github.com/SlyMarbo/spdy/spdy3/error_handling.go:68 +0x303
github.com/SlyMarbo/spdy/spdy3.(*Conn).readFrames()
/Users/rnapier/work/agent/src/github.com/SlyMarbo/spdy/spdy3/io.go:36 +0x1d9
Previous read by goroutine 12:
github.com/SlyMarbo/spdy/spdy3.(*Conn).send()
/Users/rnapier/work/agent/src/github.com/SlyMarbo/spdy/spdy3/io.go:142 +0xac6
Goroutine 13 (running) created at:
github.com/SlyMarbo/spdy/spdy3.(*Conn).Run()
/Users/rnapier/work/agent/src/github.com/SlyMarbo/spdy/spdy3/conn.go:199 +0xe0
Goroutine 12 (running) created at:
github.com/SlyMarbo/spdy/spdy3.(*Conn).Run()
/Users/rnapier/work/agent/src/github.com/SlyMarbo/spdy/spdy3/conn.go:195 +0x73
==================
I believe I'm still getting my deadlock after receiving this error:
(spdy) 2015/08/26 13:29:58 response_stream.go:267: Encountered stream error: runtime error: invalid memory address or nil pointer dereference (runtime.errorString)
Another related data race. I believe this is a regression from earlier versions. Since I upgraded to the latest spdy, my Linux 64-bit system is reliably deadlocking with this panic right after this data race (I don't get a deadlock on any of my other systems, but still get the race):
(spdy) 2015/08/27 11:39:17 response_stream.go:267: Encountered stream error: runtime error: invalid memory address or nil pointer dereference (runtime.errorString)
And the race on c.shutdownError
:
WARNING: DATA RACE
Write by goroutine 67:
github.com/SlyMarbo/spdy/spdy3.(*Conn).processFrame()
/home/rnapier/work/agent/src/github.com/SlyMarbo/spdy/spdy3/processing.go:111 +0xef6
github.com/SlyMarbo/spdy/spdy3.(*Conn).readFrames()
/home/rnapier/work/agent/src/github.com/SlyMarbo/spdy/spdy3/io.go:50 +0x63e
Previous read by goroutine 68:
github.com/SlyMarbo/spdy/spdy3.(*Conn).RequestResponse()
/home/rnapier/work/agent/src/github.com/SlyMarbo/spdy/spdy3/requests.go:147 +0x172
kace/konea/pkg/koneas.(*AgentConn).RoundTrip()
/home/rnapier/work/agent/src/kace/konea/pkg/koneas/agentconn.go:75 +0x9c
kace/konea/pkg/koneas.(*PushListener).Run()
/home/rnapier/work/agent/src/kace/konea/pkg/koneas/pushlistener.go:72 +0x4f4
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:2232 +0x0
sort.doPivot()
/usr/local/go/src/sort/sort.go:127 +0x1e1
sort.quickSort()
/usr/local/go/src/sort/sort.go:173 +0xbb
sort.quickSort()
/usr/local/go/src/sort/sort.go:177 +0x116
sort.Sort()
/usr/local/go/src/sort/sort.go:200 +0x88
compress/flate.sortByFreq()
/usr/local/go/src/compress/flate/huffman_code.go:317 +0x184
compress/flate.(*huffmanEncoder).generate()
/usr/local/go/src/compress/flate/huffman_code.go:289 +0x50b
compress/flate.(*huffmanBitWriter).writeBlock()
/usr/local/go/src/compress/flate/huffman_bit_writer.go:414 +0x7c1
compress/flate.(*compressor).writeBlock()
/usr/local/go/src/compress/flate/deflate.go:142 +0x1f3
compress/flate.(*compressor).deflate()
/usr/local/go/src/compress/flate/deflate.go:259 +0x7fc
compress/flate.(*compressor).syncFlush()
/usr/local/go/src/compress/flate/deflate.go:387 +0x7c
compress/flate.(*Writer).Flush()
/usr/local/go/src/compress/flate/deflate.go:547 +0x4e
compress/flate.(*Writer).Reset()
/usr/local/go/src/compress/flate/deflate.go:565 +0x185
compress/zlib.(*Writer).Reset()
/usr/local/go/src/compress/zlib/writer.go:80 +0xb9
github.com/SlyMarbo/spdy/common.(*compressor).Compress()
/home/rnapier/work/agent/src/github.com/SlyMarbo/spdy/common/compression.go:206 +0x18fa
github.com/SlyMarbo/spdy/spdy3/frames.(*SYN_STREAM).Compress()
/home/rnapier/work/agent/src/github.com/SlyMarbo/spdy/spdy3/frames/syn_stream.go:32 +0xb0
github.com/SlyMarbo/spdy/spdy3.(*Conn).send()
/home/rnapier/work/agent/src/github.com/SlyMarbo/spdy/spdy3/io.go:129 +0x6ed
If there is a network error (corrupted data at a minimum, but I believe I'm also encountering it when servers suddenly become unavailable), there are data races in shutdown(). This can lead to panics or lockup in the client. In some cases, it panics with:
In other cases, the panic is caught by a recover, but this leads to a deadlock because the
send
goroutine is no longer processing packets. So when the error handler tries to send a RST, the whole connection locks up. This can lead to further symptoms likePing()
never returning (which is how I discovered this issue originally).This block in shutdown seems the primary cause of data races:
Here are a few of the conflicts that popped up in my tests:
To demonstrate the problem, I use the following programs:
The client and servers generally show data races immediately, and the client will generally crash within a few seconds. If you try to reuse the client for every iteration (which better matches my real use case), the
Get
call will hang.I'm currently seeing this very often in the field, and my infrastructure is locking up usually within a few hours of usage. I have one client, and many servers that come and go, so network errors are reasonably common.