Open klauspost opened 5 years ago
This does not use unsafe and the only assembler would be in the stdlib. Imports here: https://godoc.org/github.com/klauspost/compress/flate?imports
"sync" is only used for a sync.Once and there are no goroutines, so no races should happen either.
Fuzz test imports: https://godoc.org/github.com/klauspost/compress-fuzz/flate?imports
This unfortunately happens completely at random.
Just checking if this is related to #35777. What kernel version are you running? Does the application itself receive a lot of signals?
I don't have access to the servers, so I cannot tell you. It is running on https://fuzzit.dev/ - https://twitter.com/fuzzitdev
As Keith Randall noted on the original issue #20846
This looks like the stack has been trashed somehow. Not only the return address for gopark. gopark's arguments also look trashed. The gcBgMarkWorker failure looks similar, hard to tell for sure if its args are trashed as it has only one arg.
Not sure what might cause this. Could be misuse of unsafe, could be runtime bug (use after free of stack memory?).
Same issue in go1.14, more info about our case here: https://github.com/prysmaticlabs/prysm/issues/5131
I thought this might be resolved by golang/go#37782 in go 1.14.1, but we saw the issue again last night with libfuzzer on go 1.14.1.
I was able to reproduce locally on kernel 5.3.0-45-generic
I can provide some more context about this issue, we use https://github.com/nhooyr/websocket/releases/tag/v1.8.7 to setup a websocket tunnel with default setting(with RFC 7692 permessage-deflate compression on), and we watched some panic in production, the stack trace as below:
runtime: g 79954: unexpected return pc for github.com/klauspost/compress/flate.(*decompressor).moreBits called from 0x1
stack: frame={sp:0xc002c19a30, fp:0xc002c19a68} stack=[0xc002c18000,0xc002c1a000)
0x000000c002c19930: 0x0000000000000001 0x0000000202c8d680
0x000000c002c19940: 0x0000000000000000 0x0000000000000003
0x000000c002c19950: 0x000000c002c30640 0x0000000000000000
0x000000c002c19960: 0x0000000000000003 0x0000000000000004
0x000000c002c19970: 0x0000000000416c91 <runtime.typedmemclr+0x0000000000000051> 0x000000c002c199b0
0x000000c002c19980: 0x0000000000484e8e <sync.(*Pool).Get+0x000000000000008e> 0x000000c002944d40
0x000000c002c19990: 0x000000c002944d20 0x0000000000000000
0x000000c002c199a0: 0x0000000000c2dbc0 0x000000000144a280
0x000000c002c199b0: 0x0000000000000000 0x0000000000000000
0x000000c002c199c0: 0x0000000000000000 0x0000000000000000
0x000000c002c199d0: 0x000000c002c19a20 0x0000000000453976 <runtime.sigpanic+0x00000000000002f6>
0x000000c002c199e0: 0x0000000000c2dbc0 0x000000000144a280
0x000000c002c199f0: 0x0000000002c19a00 0x000000c002c30640
0x000000c002c19a00: 0x000000c002c19a38 0x0000000000b5c41f <nhooyr.io/websocket.(*msgReader).resetFlate+0x00000000000000bf>
0x000000c002c19a10: 0x000000c002c19a38 0x0000000000407c65 <runtime.selectnbrecv+0x0000000000000025>
0x000000c002c19a20: 0x000000c002c19a40 0x0000000000b4cdc0 <github.com/klauspost/compress/flate.(*decompressor).moreBits+0x0000000000000060>
0x000000c002c19a30: <0x0000000000b4b771 <github.com/klauspost/compress/flate.(*decompressor).nextBlock+0x0000000000000031> 0x000000c002c19ae0
0x000000c002c19a40: 0x000000c002c19a78 0x0000000000b4b9dc <github.com/klauspost/compress/flate.(*decompressor).Read+0x000000000000007c>
0x000000c002c19a50: 0x000000c002c30640 0x0000000000cfebf9
0x000000c002c19a60: !0x0000000000000001 >0x0000000000000000
0x000000c002c19a70: 0x000000c002c30718 0x000000c002c19ae0
0x000000c002c19a80: 0x0000000000b5f151 <nhooyr.io/websocket.(*limitReader).Read+0x0000000000000111> 0x000000c002c30640
0x000000c002c19a90: 0x000000c002def000 0x0000000000001000
0x000000c002c19aa0: 0x00000000000000f2 0x0000000002df2000
0x000000c002c19ab0: 0x000000c002df21e0 0x000000c002c19a7c
0x000000c002c19ac0: 0x000000c002df2360 0x0000000000000000
0x000000c002c19ad0: 0x000000c002df2180 0x0000000000000000
0x000000c002c19ae0: 0x000000c002c19b80 0x0000000000b5e6a5 <nhooyr.io/websocket.(*msgReader).Read+0x0000000000000165>
0x000000c002c19af0: 0x000000c000e274a0 0x000000c002def000
0x000000c002c19b00: 0x000000c002cd6d00 0x00000000000000f2
0x000000c002c19b10: 0x0000000002df2000 0x010000c002c19b00
0x000000c002c19b20: 0x0000000000f6e6a0 0x0000000000000000
0x000000c002c19b30: 0x0000000000000000 0x0000000000000000
0x000000c002c19b40: 0x0000000000b5e320 <nhooyr.io/websocket.(*Conn).reader.func2+0x0000000000000000> 0x0000000000000000
0x000000c002c19b50: 0x0000000000000000 0x0000000000b5eb00 <nhooyr.io/websocket.(*msgReader).Read.func1+0x0000000000000000>
0x000000c002c19b60: 0x000000c000130440
fatal error: unknown caller pc
runtime stack:
runtime.throw({0xcfc8fe?, 0x1412580?})
/usr/local/lib/go/src/runtime/panic.go:1047 +0x5d fp=0xc000443be0 sp=0xc000443bb0 pc=0x43d19d
runtime.gentraceback(0xc002944d20?, 0x10000c000443f00?, 0xc000038500?, 0xc000443fb8?, 0x0, 0x0, 0x7fffffff, 0xc000443fa0, 0x440566?, 0x0)
/usr/local/lib/go/src/runtime/traceback.go:258 +0x1cf7 fp=0xc000443f50 sp=0xc000443be0 pc=0x462737
runtime.addOneOpenDeferFrame.func1()
/usr/local/lib/go/src/runtime/panic.go:645 +0x6b fp=0xc000443fc8 sp=0xc000443f50 pc=0x43c32b
runtime.systemstack()
/usr/local/lib/go/src/runtime/asm_amd64.s:492 +0x49 fp=0xc000443fd0 sp=0xc000443fc8 pc=0x46cb09
goroutine 79954 [running]:
runtime.systemstack_switch()
/usr/local/lib/go/src/runtime/asm_amd64.s:459 fp=0xc002c198e0 sp=0xc002c198d8 pc=0x46caa0
runtime.addOneOpenDeferFrame(0xc002c19a82?, 0x3?, 0x1?)
/usr/local/lib/go/src/runtime/panic.go:644 +0x69 fp=0xc002c19920 sp=0xc002c198e0 pc=0x43c269
panic({0xc2dbc0, 0x144a280})
/usr/local/lib/go/src/runtime/panic.go:844 +0x112 fp=0xc002c199e0 sp=0xc002c19920 pc=0x43cab2
runtime.panicmem(...)
/usr/local/lib/go/src/runtime/panic.go:260
runtime.sigpanic()
/usr/local/lib/go/src/runtime/signal_unix.go:843 +0x2f6 fp=0xc002c19a30 sp=0xc002c199e0 pc=0x453976
Seems the source of problem is from this method:
Go version: 1.19.4 OS: Debian 11 Kernel: 4.14.81.bm.30-amd64 #1 SMP Debian 4.14.81.bm.30 Thu May 6 03:23:40 UTC 2021 x86_64 GNU/Linux
@futurist I am not sure if this is related. My best bet would be there is a problem with the sync.Pool
reuse mechanics causing a race. Unfortunately the project seems pretty dead.
This issue should probably be closed since there is now built in fuzzing.
@klauspost That way should I repost the error into https://github.com/klauspost/compress as a new issue? If it's really not related to the go runtime. Or maybe as a new go issue?
@futurist It is not a problem in the compression package. The stack looks messed up.
It looks like either a problem in the package you are using or a runtime problem. I can't tell you which.
What version of Go are you using (
go version
)?Compiled through go-fuzz/libfuzz
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?What did you do?
I have 3 crashes crash from "fuzzit.dev" where I am running continuous fuzz testing of my compression packages. Go 1.12.10 was used for 2 builds, Go 1.13.3 for one.
There is no assembly or "unsafe" involved so there shouldn't be any reasonable way for memory corruption. This fuzzing is also strictly running in a single goroutine, so races also seems unlikely.
That said I have no idea about the hardware stability of the servers running the tests.
Also, a lot of new code has just been added here, so there is a chance of something bad, though I don't know how I would be able to trigger this error.
Crash logs: https://gist.github.com/klauspost/d4ec7bd6ecefa1bec56dd8ca4ac8ec39
Go 1.12.10 on top and bottom. Go 1.13.3 in the middle.
It is completely different functions that were pre-empted (
flate.(*fastGen).matchlenLong
) vs.flate.(*decompressor).Read
- completely different code. All crashes were inmgcmark.go:711
. Final crash was while executingbytes.(*Buffer).grow
.Crashes have not reproduced locally, so this could be a libfuzz specific problem. Build script is here: https://github.com/klauspost/compress/blob/master/fuzzit.sh#L17 - all crashes have been in the same fuzzer (flate), so it seems something in there is triggering it.
What did you expect to see?
No crash, or more information.
What did you see instead?
Crash.