golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.71k stars 17.5k forks source link

runtime: morestack_noctxt missing SPWRITE, causes "traceback stuck" assert #54332

Closed lizthegrey closed 2 years ago

lizthegrey commented 2 years ago

What version of Go are you using (go version)?

$ go version
go version go1.18.5 linux/arm64

Does this issue reproduce with the latest release?

yes, repros on go1.19.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/home/lizf/.cache/go-build"
GOENV="/home/lizf/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/lizf/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/lizf/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_arm64"
GOVCS=""
GOVERSION="go1.18.5"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3053713255=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Run continuous profiling via scraping /debug/pprof endpoints including mutex and blocking profiles, while performing a workload consuming from Kafka using Sarama and Klauspost Zstd libraries.

  runtime.SetBlockProfileRate(100000)
  runtime.SetMutexProfileFraction(100)

What did you expect to see?

No crashes.

What did you see instead?

The following crashes implicating unexpected return pc for runtime.sigtramp and traceback stuck (assert added in https://go-review.googlesource.com/c/go/+/400575/ by @cherrymui / #52116 which was backported to go1.18.5):

runtime: traceback stuck. pc=0x12eb5 sp=0x4019365840
stack: frame={sp:0x4019365840, fp:0x4019365840} stack=[0x4019364000,0x4019366000)
0x0000004019365740:  0x0000000000ba9494 <github.com/Shopify/sarama.releaseCrc32Field+0x0000000000000034>  0x00000040193657b8 
0x0000004019365750:  0x0000000000bcd8e0 <github.com/Shopify/sarama.(*realDecoder).pop+0x0000000000000060>  0x00000040193657c8 
0x0000004019365760:  0x0000000000000000  0x0000000000000000 
0x0000004019365770:  0x0000000000000700  0x0000000001068260 
0x0000004019365780:  0x00000000010ab360  0x0000000000005701 
0x0000004019365790:  0x0000000000000000  0x0000000000000000 
0x00000040193657a0:  0x0000000000000000  0x0000004000601c00 
0x00000040193657b0:  0x000000401c3c7100  0x00000040193657f8 
0x00000040193657c0:  0x0000000000bd070c <github.com/Shopify/sarama.(*RecordBatch).decode+0x000000000000037c>  0x000000400ff4e0f0 
0x00000040193657d0:  0x0000000000bd064c <github.com/Shopify/sarama.(*RecordBatch).decode+0x00000000000002bc>  0x0000004026c7805f 
0x00000040193657e0:  0x0000000000107de8 <strings.HasSuffix+0x0000000000000058>  0x0000000000000000 
0x00000040193657f0:  0x0000000000000000  0x00000040193658c8 
0x0000004019365800:  0x0000000000bd0eac <github.com/Shopify/sarama.(*Records).decode+0x000000000000009c>  0x0000004017c948c0 
0x0000004019365810:  0x0000004026d6cf92  0x0000000000012eb5 <internal/bytealg.IndexByteString+0x0000000000000005> 
0x0000004019365820:  0x0000000000012eb5 <internal/bytealg.IndexByteString+0x0000000000000005>  0x0000004019365898 
0x0000004019365830:  0x00012ee60001ce60  0x0000000000012eb5 <internal/bytealg.IndexByteString+0x0000000000000005> 
0x0000004019365840: >0x0000000000012eb5 <internal/bytealg.IndexByteString+0x0000000000000005>  0x00000000000000d7 
0x0000004019365850:  0x0000000000000000  0x0000004019365800 
0x0000004019365860:  0x0000000000bd0474 <github.com/Shopify/sarama.(*RecordBatch).decode+0x00000000000000e4>  0x00000040193658b8 
0x0000004019365870:  0x0000000000000000  0x0000000000000000 
0x0000004019365880:  0x0000000000000000  0x0000000000000000 
0x0000004019365890:  0x0000000000000000  0x0000004026d6cf92 
0x00000040193658a0:  0x000000400ff4e0f0  0x0000000000000000 
0x00000040193658b0:  0x0000000000000000  0x0000000000bd09c0 <github.com/Shopify/sarama.(*RecordBatch).decode.func1+0x0000000000000000> 
0x00000040193658c0:  0x000000400ff4e0f0  0x0000004019365918 
0x00000040193658d0:  0x0000000000bb6f1c <github.com/Shopify/sarama.(*FetchResponseBlock).decode+0x000000000000033c>  0x0000004012612c80 
0x00000040193658e0:  0x000000000154bfa8  0x0000004017c948c0 
0x00000040193658f0:  0x0000000000bb6f0c <github.com/Shopify/sarama.(*FetchResponseBlock).decode+0x000000000000032c>  0x0000000000000018 
0x0000004019365900:  0x00000000010d87c0  0x0000004000048301 
0x0000004019365910:  0x0000004017c94a40  0x00000040193659c8 
0x0000004019365920:  0x0000000000bb78e8 <github.com/Shopify/sarama.(*FetchResponse).decode+0x00000000000002a8>  0x0000004009b9fc38 
0x0000004019365930:  0x000000000154bfa8  0x0000004017c948c0 
fatal error: traceback stuck

goroutine 0 [idle]:
runtime: unexpected return pc for runtime.sigtramp called from 0xffff985dc5c0
stack: frame={sp:0x400eacacd0, fp:0x400eacada0} stack=[0x400eac4000,0x400eacc000)
0x000000400eacabd0:  0x000000400eacac28  0x02c595e603246a9e 
0x000000400eacabe0:  0x0000000002c595e5  0x000000400eaba000 
0x000000400eacabf0:  0x0000000000000000  0x000000400eac9fe0 
0x000000400eacac00:  0x0000004013a80000  0x000000400eacada0 
0x000000400eacac10:  0x000000400eacae20  0x000000400eacac98 
0x000000400eacac20:  0x000000000007d58c <runtime.sigtrampgo+0x000000000000001c>  0x000000400000001b 
0x000000400eacac30:  0x0000004000601c00  0x000000400eacac58 
0x000000400eacac40:  0x0000004000603040  0x02bcec7502c540fb 
0x000000400eacac50:  0x00246aa103246aa0  0x0000000000000000 
0x000000400eacac60:  0x0000000000000000  0x0000000000000000 
0x000000400eacac70:  0x0000000000000000  0x0000000000000000 
0x000000400eacac80:  0x0000004000603040  0x000000400eacada0 
0x000000400eacac90:  0x000000400eacae20  0x000000400eacacc8 
0x000000400eacaca0:  0x000000000007cea4 <runtime.sigtramp+0x0000000000000084>  0x000000000000001b 
0x000000400eacacb0:  0x000000400eacada0  0x000000400eacae20 
0x000000400eacacc0:  0x02c5956e03246a9e  0x000000400eacbff0 
0x000000400eacacd0: <0x0000ffff985dc5c0  0x000000400000001b 
0x000000400eacace0:  0x000000400eacada0  0x000000400eacae20 
0x000000400eacacf0:  0x000000000005abba <runtime.selparkcommit+0x000000000000000a>  0x1265726f74537070 
0x000000400eacad00:  0x021210eed9a50809  0x623824220118000a 
0x000000400eacad10:  0x302d333733663636  0x0000000000000015 
0x000000400eacad20:  0x0000000000000071  0x000000400b753e98 
0x000000400eacad30:  0x000000000153a0c0  0x0000004000603040 
0x000000400eacad40:  0x000000400eacacc8  0x3e66376972bea4d0 
0x000000400eacad50:  0xbebbbd41c5d26bf1  0x3f11566aaf25de2c 
0x000000400eacad60:  0xbf66c16c16bebd93  0x3fc5555555555555 
0x000000400eacad70:  0x3fcde3175e2d734a  0x3ffc439d143a5197 
0x000000400eacad80:  0xbfd1a21a8e7d2cb5  0x0000000000000000 
0x000000400eacad90:  0x0404040202020303  0x0000000000000000 
0x000000400eacada0: >0x000000000000001b  0x00000000fffffffe 
0x000000400eacadb0:  0x0000000000000466  0x0000000000000000 
0x000000400eacadc0:  0x0000000000000000  0x0000000000000000 
0x000000400eacadd0:  0x0000000000000000  0x0000000000000000 
0x000000400eacade0:  0x0000000000000000  0x0000000000000000 
0x000000400eacadf0:  0x0000000000000000  0x0000000000000000 
0x000000400eacae00:  0x0000000000000000  0x0000000000000000 
0x000000400eacae10:  0x0000000000000000  0x0000000000000000 
0x000000400eacae20:  0x0000000000000000  0x0000000000000000 
0x000000400eacae30:  0x000000400eac4000  0x0000004000000000 
0x000000400eacae40:  0x0000000000008000  0x0000000000000000 
0x000000400eacae50:  0x0000000000000000  0x0000000000000000 
0x000000400eacae60:  0x0000000000000000  0x0000000000000000 
0x000000400eacae70:  0x00002a1c49103b0c  0x02bcec6f02b9baa6 
0x000000400eacae80:  0x02b9baaf03246a9f  0x02c5956e02bcec75 
0x000000400eacae90:  0x0000000000000000  0x0000004013a805d0 
runtime.throw({0x1262952?, 0x1fae6c0?})
    /usr/local/go/src/runtime/panic.go:992 +0x50
runtime.gentraceback(0x203ad80?, 0x400eac9a90?, 0x400eaca968?, 0x4000603040, 0x0, 0x400eaca968, 0x40, 0x0, 0x0?, 0x6)
    /usr/local/go/src/runtime/traceback.go:488 +0xd98
runtime.sigprof(0x79650, 0x400eacac08?, 0x4013a80010?, 0x4000603040, 0x4000601c00)
    /usr/local/go/src/runtime/proc.go:4507 +0xf0
runtime.sighandler(0x1b?, 0x4000601c00?, 0x400eacac58?, 0x4000603040?)
    /usr/local/go/src/runtime/signal_unix.go:613 +0x514
runtime.sigtrampgo(0x1b, 0x400eacada0, 0x400eacae20)
    /usr/local/go/src/runtime/signal_unix.go:477 +0x164
runtime.sigtrampgo(0x1b, 0x400eacada0, 0x400eacae20)
    <autogenerated>:1 +0x1c
runtime: unexpected return pc for runtime.sigtramp called from 0xffff985dc5c0
stack: frame={sp:0x400eacacd0, fp:0x400eacada0} stack=[0x400eac4000,0x400eacc000)
0x000000400eacabd0:  0x000000400eacac28  0x02c595e603246a9e 
0x000000400eacabe0:  0x0000000002c595e5  0x000000400eaba000 
0x000000400eacabf0:  0x0000000000000000  0x000000400eac9fe0 
0x000000400eacac00:  0x0000004013a80000  0x000000400eacada0 
0x000000400eacac10:  0x000000400eacae20  0x000000400eacac98 
0x000000400eacac20:  0x000000000007d58c <runtime.sigtrampgo+0x000000000000001c>  0x000000400000001b 
0x000000400eacac30:  0x0000004000601c00  0x000000400eacac58 
0x000000400eacac40:  0x0000004000603040  0x02bcec7502c540fb 
0x000000400eacac50:  0x00246aa103246aa0  0x0000000000000000 
0x000000400eacac60:  0x0000000000000000  0x0000000000000000 
0x000000400eacac70:  0x0000000000000000  0x0000000000000000 
0x000000400eacac80:  0x0000004000603040  0x000000400eacada0 
0x000000400eacac90:  0x000000400eacae20  0x000000400eacacc8 
0x000000400eacaca0:  0x000000000007cea4 <runtime.sigtramp+0x0000000000000084>  0x000000000000001b 
0x000000400eacacb0:  0x000000400eacada0  0x000000400eacae20 
0x000000400eacacc0:  0x02c5956e03246a9e  0x000000400eacbff0 
0x000000400eacacd0: <0x0000ffff985dc5c0  0x000000400000001b 
0x000000400eacace0:  0x000000400eacada0  0x000000400eacae20 
0x000000400eacacf0:  0x000000000005abba <runtime.selparkcommit+0x000000000000000a>  0x1265726f74537070 
0x000000400eacad00:  0x021210eed9a50809  0x623824220118000a 
0x000000400eacad10:  0x302d333733663636  0x0000000000000015 
0x000000400eacad20:  0x0000000000000071  0x000000400b753e98 
0x000000400eacad30:  0x000000000153a0c0  0x0000004000603040 
0x000000400eacad40:  0x000000400eacacc8  0x3e66376972bea4d0 
0x000000400eacad50:  0xbebbbd41c5d26bf1  0x3f11566aaf25de2c 
0x000000400eacad60:  0xbf66c16c16bebd93  0x3fc5555555555555 
0x000000400eacad70:  0x3fcde3175e2d734a  0x3ffc439d143a5197 
0x000000400eacad80:  0xbfd1a21a8e7d2cb5  0x0000000000000000 
0x000000400eacad90:  0x0404040202020303  0x0000000000000000 
0x000000400eacada0: >0x000000000000001b  0x00000000fffffffe 
0x000000400eacadb0:  0x0000000000000466  0x0000000000000000 
0x000000400eacadc0:  0x0000000000000000  0x0000000000000000 
0x000000400eacadd0:  0x0000000000000000  0x0000000000000000 
0x000000400eacade0:  0x0000000000000000  0x0000000000000000 
0x000000400eacadf0:  0x0000000000000000  0x0000000000000000 
0x000000400eacae00:  0x0000000000000000  0x0000000000000000 
0x000000400eacae10:  0x0000000000000000  0x0000000000000000 
0x000000400eacae20:  0x0000000000000000  0x0000000000000000 
0x000000400eacae30:  0x000000400eac4000  0x0000004000000000 
0x000000400eacae40:  0x0000000000008000  0x0000000000000000 
0x000000400eacae50:  0x0000000000000000  0x0000000000000000 
0x000000400eacae60:  0x0000000000000000  0x0000000000000000 
0x000000400eacae70:  0x00002a1c49103b0c  0x02bcec6f02b9baa6 
0x000000400eacae80:  0x02b9baaf03246a9f  0x02c5956e02bcec75 
0x000000400eacae90:  0x0000000000000000  0x0000004013a805d0 
runtime.sigtramp()
    /usr/local/go/src/runtime/sys_linux_arm64.s:489 +0x84

goroutine 89 [running]:
runtime: traceback stuck. pc=0x3ff93 sp=0x40048952c0
stack: frame={sp:0x40048952c0, fp:0x40048952c0} stack=[0x4004894000,0x4004896000)
0x00000040048951c0:  0x0000000000000040  0x00000040048951e8 
0x00000040048951d0:  0x00000000005019a4 <github.com/klauspost/compress/zstd.(*sequenceDec).init+0x0000000000000054>  0x0000004004895310 
0x00000040048951e0:  0x00000040008c6000  0x0000004004895228 
0x00000040048951f0:  0x0000000000501b9c <github.com/klauspost/compress/zstd.(*sequenceDecs).initialize+0x000000000000016c>  0x0000004004895338 
0x0000004004895200:  0x00000000004e2b84 <github.com/klauspost/compress/zstd.(*blockDec).decodeCompressed+0x0000000000000164>  0x0000004000530058 
0x0000004004895210:  0x00000040002e22a0  0x00000000006c0001 <text/template/parse.(*endNode).Copy+0x0000000000000001> 
0x0000004004895220:  0x0000000f00800005  0x000001fd00280903 
0x0000004004895230:  0x00000000004e2ebc <github.com/klauspost/compress/zstd.(*blockDec).prepareSequences+0x000000000000025c>  0x000000000003ff93 <runtime.goroutineProfileWithLabels.func4.1+0x0000000000000003> 
0x0000004004895240:  0x00000000000ee1a4 <os.UserHomeDir+0x00000000000000c4>  0x0000000000000811 
0x0000004004895250:  0x0000000000000015  0x0000000000020000 <runtime.advanceEvacuationMark+0x0000000000000000> 
0x0000004004895260:  0x00000000006c0001 <text/template/parse.(*endNode).Copy+0x0000000000000001>  0x0000000000000000 
0x0000004004895270:  0x0000000000000006  0x0000000f00800005 
0x0000004004895280:  0x000001fd00280903  0x0000000000000000 
0x0000004004895290:  0x000000400720ea00  0x0000004004895338 
0x00000040048952a0:  0x00000000004e2a78 <github.com/klauspost/compress/zstd.(*blockDec).decodeCompressed+0x0000000000000058>  0x0000004000530058 
0x00000040048952b0:  0x00000040002e22a0  0x00000000000ee1a4 <os.UserHomeDir+0x00000000000000c4> 
0x00000040048952c0: >0x000000000003ff93 <runtime.goroutineProfileWithLabels.func4.1+0x0000000000000003>  0x0000000000020000 <runtime.advanceEvacuationMark+0x0000000000000000> 
0x00000040048952d0:  0x0000004009f42000  0x00000040002e22a0 
0x00000040048952e0:  0x0000000000001a81  0x0000004009f42000 
0x00000040048952f0:  0x0000004005b17500  0x000000400720ea00 
0x0000004004895300:  0x0000004004a28a00  0x00000040002e22a0 
0x0000004004895310:  0x0000004003bcf000  0x000000400885150c 
0x0000004004895320:  0x000000000000467b  0x000000000002ba01 <runtime.gcStart+0x0000000000000701> 
0x0000004004895330:  0x000000000000002e  0x0000004004895398 
0x0000004004895340:  0x00000000004e1974 <github.com/klauspost/compress/zstd.(*blockDec).decodeBuf+0x0000000000000284>  0x0000004000530058 
0x0000004004895350:  0x0000000000000000  0x0000000000000000 
0x0000004004895360:  0x0000000000000000  0x0000004000530050 
0x0000004004895370:  0x0000000000000000  0x0000000000000000 
0x0000004004895380:  0x0000000000000000  0x0000000000020000 <runtime.advanceEvacuationMark+0x0000000000000000> 
0x0000004004895390:  0x0000000000000000  0x0000004004895458 
0x00000040048953a0:  0x00000000004fd2cc <github.com/klauspost/compress/zstd.(*frameDec).runDecoder+0x000000000000016c>  0x000000400058c000 
0x00000040048953b0:  0x0000004000530050  0x0000000000000000 
fatal error: traceback stuck
goroutine 0 [idle]:
runtime: unexpected return pc for runtime.sigtramp called from 0xffffabfd25c0
stack: frame={sp:0x400000acd0, fp:0x400000ada0} stack=[0x4000004000,0x400000c000)
0x000000400000abd0:  0x000000400000ac28  0x0000000000000000 
0x000000400000abe0:  0x0000000000000000  0x0000004000002000 
0x000000400000abf0:  0x0000000000000000  0x0000000000000000 
0x000000400000ac00:  0x0000000000000000  0x000000400000ada0 
0x000000400000ac10:  0x000000400000ae20  0x000000400000ac98 
0x000000400000ac20:  0x000000000007d58c <runtime.sigtrampgo+0x000000000000001c>  0x000000400000001b 
0x000000400000ac30:  0x0000000002046680  0x000000400000ac58 
0x000000400000ac40:  0x0000004000521d40  0x0000000000000000 
0x000000400000ac50:  0x0000000000000000  0x0000000000000000 
0x000000400000ac60:  0x0000000000000000  0x0000000000000000 
0x000000400000ac70:  0x0000000000000000  0x0000000000000000 
0x000000400000ac80:  0x0000004000521d40  0x000000400000ada0 
0x000000400000ac90:  0x000000400000ae20  0x000000400000acc8 
0x000000400000aca0:  0x000000000007cea4 <runtime.sigtramp+0x0000000000000084>  0x000000000000001b 
0x000000400000acb0:  0x000000400000ada0  0x000000400000ae20 
0x000000400000acc0:  0x0000000000000000  0x000000400000bff0 
0x000000400000acd0: <0x0000ffffabfd25c0  0x000000000000001b 
0x000000400000ace0:  0x000000400000ada0  0x000000400000ae20 
0x000000400000acf0:  0x0000000000000009  0x0000000000000015 
0x000000400000ad00:  0x0000000000000000  0x00000040008c9fab 
0x000000400000ad10:  0x000000000001ff93 <runtime.evacuate+0x00000000000004b3>  0x0000000000000009 
0x000000400000ad20:  0x000000000000032d  0x00000040048953d0 
0x000000400000ad30:  0x0000000002078100  0x0000004000521d40 
0x000000400000ad40:  0x000000400000acc8  0x3e66376972bea4d0 
0x000000400000ad50:  0xbebbbd41c5d26bf1  0x3f11566aaf25de2c 
0x000000400000ad60:  0xbf66c16c16bebd93  0x3fc5555555555555 
0x000000400000ad70:  0x3fb8fcd2e18d7b62  0x3ffe7032d1e7284a 
0x000000400000ad80:  0xbfbab6204490a1e0  0x0000000000000000 
0x000000400000ad90:  0x0000000000000000  0x0000000000000000 
0x000000400000ada0: >0x000000000000001b  0x00000000fffffffe 
0x000000400000adb0:  0x0000000000004d16  0x0000000000000000 
0x000000400000adc0:  0x0000000000000000  0x0000000000000000 
0x000000400000add0:  0x0000000000000000  0x0000000000000000 
0x000000400000ade0:  0x0000000000000000  0x0000000000000000 
0x000000400000adf0:  0x0000000000000000  0x0000000000000000 
0x000000400000ae00:  0x0000000000000000  0x0000000000000000 
0x000000400000ae10:  0x0000000000000000  0x0000000000000000 
0x000000400000ae20:  0x0000000000000000  0x0000000000000000 
0x000000400000ae30:  0x0000004000004000  0x0000000000000000 
0x000000400000ae40:  0x0000000000008000  0x0000000000000000 
0x000000400000ae50:  0x0000000000000000  0x0000000000000000 
0x000000400000ae60:  0x0000000000000000  0x0000000000000000 
0x000000400000ae70:  0x0000000000000000  0x0000000000000000 
0x000000400000ae80:  0x0000000000000000  0x0000000000000000 
0x000000400000ae90:  0x0000000000000000  0x0000000000000000 
runtime.throw({0x1275238?, 0x1fbe760?})
    /usr/local/go/src/runtime/panic.go:992 +0x50
runtime.gentraceback(0x204afc0?, 0x0?, 0x400000a968?, 0x4000521d40, 0x0, 0x400000a968, 0x40, 0x0, 0x0?, 0x6)
    /usr/local/go/src/runtime/traceback.go:488 +0xd98
runtime.sigprof(0x79650, 0x400000ac08?, 0x0?, 0x4000521d40, 0x2046680)
    /usr/local/go/src/runtime/proc.go:4507 +0xf0
runtime.sighandler(0x1b?, 0x2046680?, 0x400000ac58?, 0x4000521d40?)
    /usr/local/go/src/runtime/signal_unix.go:613 +0x514
runtime.sigtrampgo(0x1b, 0x400000ada0, 0x400000ae20)
    /usr/local/go/src/runtime/signal_unix.go:477 +0x164
runtime.sigtrampgo(0x1b, 0x400000ada0, 0x400000ae20)
    <autogenerated>:1 +0x1c
runtime: unexpected return pc for runtime.sigtramp called from 0xffffabfd25c0
stack: frame={sp:0x400000acd0, fp:0x400000ada0} stack=[0x4000004000,0x400000c000)
0x000000400000abd0:  0x000000400000ac28  0x0000000000000000 
0x000000400000abe0:  0x0000000000000000  0x0000004000002000 
0x000000400000abf0:  0x0000000000000000  0x0000000000000000 
0x000000400000ac00:  0x0000000000000000  0x000000400000ada0 
0x000000400000ac10:  0x000000400000ae20  0x000000400000ac98 
0x000000400000ac20:  0x000000000007d58c <runtime.sigtrampgo+0x000000000000001c>  0x000000400000001b 
0x000000400000ac30:  0x0000000002046680  0x000000400000ac58 
0x000000400000ac40:  0x0000004000521d40  0x0000000000000000 
0x000000400000ac50:  0x0000000000000000  0x0000000000000000 
0x000000400000ac60:  0x0000000000000000  0x0000000000000000 
0x000000400000ac70:  0x0000000000000000  0x0000000000000000 
0x000000400000ac80:  0x0000004000521d40  0x000000400000ada0 
0x000000400000ac90:  0x000000400000ae20  0x000000400000acc8 
0x000000400000aca0:  0x000000000007cea4 <runtime.sigtramp+0x0000000000000084>  0x000000000000001b 
0x000000400000acb0:  0x000000400000ada0  0x000000400000ae20 
0x000000400000acc0:  0x0000000000000000  0x000000400000bff0 
0x000000400000acd0: <0x0000ffffabfd25c0  0x000000000000001b 
0x000000400000ace0:  0x000000400000ada0  0x000000400000ae20 
0x000000400000acf0:  0x0000000000000009  0x0000000000000015 
0x000000400000ad00:  0x0000000000000000  0x00000040008c9fab 
0x000000400000ad10:  0x000000000001ff93 <runtime.evacuate+0x00000000000004b3>  0x0000000000000009 
0x000000400000ad20:  0x000000000000032d  0x00000040048953d0 
0x000000400000ad30:  0x0000000002078100  0x0000004000521d40 
0x000000400000ad40:  0x000000400000acc8  0x3e66376972bea4d0 
0x000000400000ad50:  0xbebbbd41c5d26bf1  0x3f11566aaf25de2c 
0x000000400000ad60:  0xbf66c16c16bebd93  0x3fc5555555555555 
0x000000400000ad70:  0x3fb8fcd2e18d7b62  0x3ffe7032d1e7284a 
0x000000400000ad80:  0xbfbab6204490a1e0  0x0000000000000000 
0x000000400000ad90:  0x0000000000000000  0x0000000000000000 
0x000000400000ada0: >0x000000000000001b  0x00000000fffffffe 
0x000000400000adb0:  0x0000000000004d16  0x0000000000000000 
0x000000400000adc0:  0x0000000000000000  0x0000000000000000 
0x000000400000add0:  0x0000000000000000  0x0000000000000000 
0x000000400000ade0:  0x0000000000000000  0x0000000000000000 
0x000000400000adf0:  0x0000000000000000  0x0000000000000000 
0x000000400000ae00:  0x0000000000000000  0x0000000000000000 
0x000000400000ae10:  0x0000000000000000  0x0000000000000000 
0x000000400000ae20:  0x0000000000000000  0x0000000000000000 
0x000000400000ae30:  0x0000004000004000  0x0000000000000000 
0x000000400000ae40:  0x0000000000008000  0x0000000000000000 
0x000000400000ae50:  0x0000000000000000  0x0000000000000000 
0x000000400000ae60:  0x0000000000000000  0x0000000000000000 
0x000000400000ae70:  0x0000000000000000  0x0000000000000000 
0x000000400000ae80:  0x0000000000000000  0x0000000000000000 
0x000000400000ae90:  0x0000000000000000  0x0000000000000000 
runtime.sigtramp()
    /usr/local/go/src/runtime/sys_linux_arm64.s:489 +0x84
goroutine 197 [running]:
prattmic commented 2 years ago

Given that you commented on #52116, I assume that this application was also suffering from that bug before this (seemingly incomplete) fix?

cherrymui commented 2 years ago

This stack looks weird. 0x12eb5 is not even an aligned PC (ARM64 instructions are always 4 bytes). Why would it appear on the stack? 0x3ff93 from another stack trace is similarly weird.

Maybe they are not PCs? Given that a nearby stack word has content 0x00012ee60001ce60, maybe 0x12eb5, 0x12ee6, 0x1ce60 are not PCs but some (32-bit?) integers on stack?

On both stack traces the SIGPROF lands on PC 0x79650. What is that PC? Is it a Go function? Thanks.

lizthegrey commented 2 years ago

Given that you commented on #52116, I assume that this application was also suffering from that bug before this (seemingly incomplete) fix?

~It was. But we backported the fix and it was working fine. The new change for us is turning on not just CPU/mem profiling, but block profiling too.~

lizthegrey commented 2 years ago

This stack looks weird. 0x12eb5 is not even an aligned PC (ARM64 instructions are always 4 bytes). Why would it appear on the stack? 0x3ff93 from another stack trace is similarly weird.

Maybe they are not PCs? Given that a nearby stack word has content 0x00012ee60001ce60, maybe 0x12eb5, 0x12ee6, 0x1ce60 are not PCs but some (32-bit?) integers on stack?

On both stack traces the SIGPROF lands on PC 0x79650. What is that PC? Is it a Go function? Thanks.

$ go tool addr2line retriever
0x79650
runtime.morestack_noctxt
/usr/local/go/src/runtime/asm_arm64.s:321
lizthegrey commented 2 years ago

Given that you commented on #52116, I assume that this application was also suffering from that bug before this (seemingly incomplete) fix?

It was. But we backported the fix and it was working fine. The new change for us is turning on not just CPU/mem profiling, but block profiling too.

Addendum: I just trawled errors, and indeed, we do have significant numbers of "traceback stuck" dating back multiple months, just in applications that were able to restart after crashing without coming to my attention.

lizthegrey commented 2 years ago

Here are some additional crashes from other applications:

$ go tool addr2line barbet
0x79410
0x79414
runtime.morestack_noctxt
/usr/local/go/src/runtime/asm_arm64.s:321
runtime.morestack_noctxt
/usr/local/go/src/runtime/asm_arm64.s:322
$ go tool addr2line beagle
0x79410
runtime.morestack_noctxt
/usr/local/go/src/runtime/asm_arm64.s:321
$ go tool addr2line shepherd
0x790a0
0x791d0
0x791d4
runtime.systemstack
/usr/local/go/src/runtime/asm_arm64.s:204
runtime.morestack_noctxt
/usr/local/go/src/runtime/asm_arm64.s:321
runtime.morestack_noctxt
/usr/local/go/src/runtime/asm_arm64.s:322
lizthegrey commented 2 years ago

https://github.com/golang/go/blob/go1.18.5/src/runtime/asm_arm64.s#L321 -- culprit lines here.

similarity to https://github.com/golang/go/issues/52829, does this need to not have NOFRAME ie https://go-review.googlesource.com/c/go/+/405482/?

lizthegrey commented 2 years ago

Also repros with go1.19.0

runtime: traceback stuck. pc=0x58e9e0 sp=0x4035cfada0
stack: frame={sp:0x4035cfada0, fp:0x4035cfada0} stack=[0x4035cf8000,0x4035cfc000)
0x0000004035cfaca0:  0x0000004047098bd0  0x000000010001aae8 
0x0000004035cfacb0:  0xc90000000001bfb0  0x0000ffff94c3f878 
0x0000004035cfacc0:  0xc9c4a878b22454ee  0x0000000000000006 
0x0000004035cfacd0:  0x000000000058db7c <github.com/honeycombio/libhoney-go.(*fieldHolder).AddField+0x000000000000004c>  0x000000403cd55520 
0x0000004035cface0:  0x000000000058dbf0 <github.com/honeycombio/libhoney-go.(*fieldHolder).AddField+0x00000000000000c0>  0x0000004035cfad38 
0x0000004035cfacf0:  0x00000040470acb20  0x0000004035cfad58 
0x0000004035cfad00:  0x000000000058e980 <github.com/honeycombio/libhoney-go.(*Event).AddField+0x0000000000000110>  0x0000000000fcae00 
0x0000004035cfad10:  0x0000004047098bd0  0x00000040469ff6c8 
0x0000004035cfad20:  0x0000000000000014  0x0100000000000014 
0x0000004035cfad30:  0x00000040470b2248  0x0000000000ebb3a0 
0x0000004035cfad40:  0x000000000058dc50 <github.com/honeycombio/libhoney-go.(*fieldHolder).AddField.func1+0x0000000000000000>  0x00000040470b2248 
0x0000004035cfad50:  0x0000004035cfad40  0x0000004035cfadb8 
0x0000004035cfad60:  0x00000000005a5898 <github.com/honeycombio/beeline-go/trace.(*Span).AddField+0x0000000000000158>  0x00000040470b2240 
0x0000004035cfad70:  0x00000040469ff6c8  0x0000000000000014 
0x0000004035cfad80:  0x0000000000ebb3a0  0x0000000001ed2ac8 
0x0000004035cfad90:  0x01000000005a5840  0x0000000000f9a880 
0x0000004035cfada0: >0x000000000058e9e0 <github.com/honeycombio/libhoney-go.(*Event).AddField.func1+0x0000000000000000>  0x00000040470b226c 
0x0000004035cfadb0:  0x0000004035cfada0  0x0000004035cfae28 
0x0000004035cfadc0:  0x00000000005a7a4c <github.com/honeycombio/beeline-go.AddField+0x00000000000000cc>  0x0000ffff94c3f878 
0x0000004035cfadd0:  0x00000040469ff6c8  0x0000000001ed2ac8 
0x0000004035cfade0:  0x0000000000ebb3a0  0x0000000001ed2ac8 
0x0000004035cfadf0:  0x0000004035cfae08  0x0000004035cfae28 
0x0000004035cfae00:  0x01000000005a7a00  0x0000000000f9a880 
0x0000004035cfae10:  0x00000000005a5900 <github.com/honeycombio/beeline-go/trace.(*Span).AddField.func1+0x0000000000000000>  0x0000004043dc5688 
0x0000004035cfae20:  0x0000004035cfae10  0x0000004035cfae88 
0x0000004035cfae30:  0x0000000000d79884 <github.com/honeycombio/hound/cmd/shepherd/app.(*App).getSchema+0x00000000000000f4>  0x0000004043dc5600 
0x0000004035cfae40:  0x00000040469ff6c8  0x0000000000000014 
0x0000004035cfae50:  0x0000000000ebb3a0  0x0000000001ed2ac8 
0x0000004035cfae60:  0x000000400015aae0  0x0000000000000014 
0x0000004035cfae70:  0x0000000000ebb3a0  0x0000004043dc5600 
0x0000004035cfae80:  0x00000040469ff6c8  0x0000004035cfaf48 
0x0000004035cfae90:  0x0000000000d755bc <github.com/honeycombio/hound/cmd/shepherd/app.(*App).batchProcess.func1+0x000000000000043c>  0x0000000001491d40 
fatal error: traceback stuck

goroutine 0 [idle]:
runtime.throw({0x11b3553?, 0x1ebf180?})
    /usr/local/go/src/runtime/panic.go:1047 +0x40 fp=0x40009ac610 sp=0x40009ac5e0 pc=0x490f0
runtime.gentraceback(0x1f3c7e0?, 0x0?, 0x40009ac9a8?, 0x403cd55520, 0x0, 0x40009ac9a8, 0x40, 0x0, 0x0?, 0x6)
    /usr/local/go/src/runtime/traceback.go:488 +0xde4 fp=0x40009ac940 sp=0x40009ac610 pc=0x703f4
runtime.sigprof(0x7abe0, 0x40009acc48?, 0x0?, 0x403cd55520, 0x40009a4000)
    /usr/local/go/src/runtime/proc.go:4567 +0xe8 fp=0x40009acbb0 sp=0x40009ac940 pc=0x55618
runtime.sighandler(0x1b?, 0x40009a4000?, 0x40009acc98?, 0x40005e4000?)
    /usr/local/go/src/runtime/signal_unix.go:629 +0x574 fp=0x40009acc60 sp=0x40009acbb0 pc=0x60334
runtime.sigtrampgo(0x1b, 0x40009acda0, 0x40009ace20)
    /usr/local/go/src/runtime/signal_unix.go:479 +0x164 fp=0x40009acce0 sp=0x40009acc60 pc=0x5f9c4
runtime.sigtramp()
    /usr/local/go/src/runtime/sys_linux_arm64.s:471 +0x58 fp=0x40009acda0 sp=0x40009acce0 pc=0x7e228

goroutine 1503875 [running]:
runtime: traceback stuck. pc=0x58e9e0 sp=0x400ffaeda0
stack: frame={sp:0x400ffaeda0, fp:0x400ffaeda0} stack=[0x400ffac000,0x400ffb0000)
0x000000400ffaeca0:  0x0000004046ce2240  0x000000010001f0b8 
0x000000400ffaecb0:  0x7300000001243730  0x0000ffff9d953948 
0x000000400ffaecc0:  0xc8f36a7a4ff431e7  0x0000000000000007 
0x000000400ffaecd0:  0x000000000058db7c <github.com/honeycombio/libhoney-go.(*fieldHolder).AddField+0x000000000000004c>  0x000000400cfa5520 
0x000000400ffaece0:  0x000000000058dc00 <github.com/honeycombio/libhoney-go.(*fieldHolder).AddField+0x00000000000000d0>  0x000000400ffaed38 
0x000000400ffaecf0:  0x0000000000065708 <runtime.rawstringtmp+0x0000000000000048>  0x000000400ffaed58 
0x000000400ffaed00:  0x000000000058e980 <github.com/honeycombio/libhoney-go.(*Event).AddField+0x0000000000000110>  0x0000000000fcae00 
0x000000400ffaed10:  0x0000004046ce2240  0x0000004046110d50 
0x000000400ffaed20:  0x0000000000000016  0x0100000000000016 
0x000000400ffaed30:  0x00000040453eaf68  0x0000000000eccea0 
0x000000400ffaed40:  0x000000000058dc50 <github.com/honeycombio/libhoney-go.(*fieldHolder).AddField.func1+0x0000000000000000>  0x00000040453eaf68 
0x000000400ffaed50:  0x000000400ffaed40  0x000000400ffaedb8 
0x000000400ffaed60:  0x00000000005a5898 <github.com/honeycombio/beeline-go/trace.(*Span).AddField+0x0000000000000158>  0x00000040453eaf60 
0x000000400ffaed70:  0x0000004046110d50  0x0000000000000016 
0x000000400ffaed80:  0x0000000000eccea0  0x000000000147d320 
0x000000400ffaed90:  0x01000000005a5840  0x0000000000f9a880 
0x000000400ffaeda0: >0x000000000058e9e0 <github.com/honeycombio/libhoney-go.(*Event).AddField.func1+0x0000000000000000>  0x00000040453eaf8c 
0x000000400ffaedb0:  0x000000400ffaeda0  0x000000400ffaee28 
0x000000400ffaedc0:  0x00000000005a7a4c <github.com/honeycombio/beeline-go.AddField+0x00000000000000cc>  0x0000ffff9d953948 
0x000000400ffaedd0:  0x0000004046110d50  0x000000000147d320 
0x000000400ffaede0:  0x0000000000eccea0  0x000000000147d320 
0x000000400ffaedf0:  0x000000400ffaee08  0x000000400ffaee28 
0x000000400ffaee00:  0x01000000005a7a00  0x0000000000f9a880 
0x000000400ffaee10:  0x00000000005a5900 <github.com/honeycombio/beeline-go/trace.(*Span).AddField.func1+0x0000000000000000>  0x00000040453c2b88 
0x000000400ffaee20:  0x000000400ffaee10  0x000000400ffaee88 
0x000000400ffaee30:  0x0000000000d79a6c <github.com/honeycombio/hound/cmd/shepherd/app.(*App).getSchema+0x00000000000002dc>  0x00000040453c2b00 
0x000000400ffaee40:  0x0000004046110d50  0x0000000000000016 
0x000000400ffaee50:  0x0000000000eccea0  0x000000000147d320 
0x000000400ffaee60:  0x00000040006af0b0  0x0000000000000016 
0x000000400ffaee70:  0x0000000000eccea0  0x00000040453c2b00 
0x000000400ffaee80:  0x0000004046110d50  0x000000400ffaef48 
0x000000400ffaee90:  0x0000000000d755bc <github.com/honeycombio/hound/cmd/shepherd/app.(*App).batchProcess.func1+0x000000000000043c>  0x0000000001491d40 
fatal error: traceback stuck

goroutine 0 [idle]:
runtime.throw({0x11b3553?, 0x1ebf180?})
    /usr/local/go/src/runtime/panic.go:1047 +0x40 fp=0x400078a610 sp=0x400078a5e0 pc=0x490f0
runtime.gentraceback(0x1f3c7e0?, 0x0?, 0x400078a9a8?, 0x400cfa5520, 0x0, 0x400078a9a8, 0x40, 0x0, 0x0?, 0x6)
    /usr/local/go/src/runtime/traceback.go:488 +0xde4 fp=0x400078a940 sp=0x400078a610 pc=0x703f4
runtime.sigprof(0x7abe4, 0x400078ac48?, 0x0?, 0x400cfa5520, 0x4000780000)
    /usr/local/go/src/runtime/proc.go:4567 +0xe8 fp=0x400078abb0 sp=0x400078a940 pc=0x55618
runtime.sighandler(0x1b?, 0x4000780000?, 0x400078ac98?, 0x40007821a0?)
    /usr/local/go/src/runtime/signal_unix.go:629 +0x574 fp=0x400078ac60 sp=0x400078abb0 pc=0x60334
runtime.sigtrampgo(0x1b, 0x400078ada0, 0x400078ae20)
    /usr/local/go/src/runtime/signal_unix.go:479 +0x164 fp=0x400078ace0 sp=0x400078ac60 pc=0x5f9c4
runtime.sigtramp()
    /usr/local/go/src/runtime/sys_linux_arm64.s:471 +0x58 fp=0x400078ada0 sp=0x400078ace0 pc=0x7e228

goroutine 16981043 [running]:
go tool addr2line shepherd
0x7abe4
0x7abe0
runtime.morestack_noctxt
/usr/local/go/src/runtime/asm_arm64.s:324
runtime.morestack_noctxt
/usr/local/go/src/runtime/asm_arm64.s:323

https://github.com/golang/go/blob/go1.19/src/runtime/asm_arm64.s#L323-L324

cherrymui commented 2 years ago

Thanks for the details @lizthegrey !

similarity to https://github.com/golang/go/issues/52829, does this need to not have NOFRAME ie https://go-review.googlesource.com/c/go/+/405482/?

I don't think this is necessary. The ARM32 function contains a call in some configurations, whereas the ARM64 function should not contain a call.

So the signal lands at the entry of morestack_noctxt or systemstack. At this point the stack switch hasn't occurred so it should be able to traceback normally. I don't immediately see what is wrong. I'll keep looking.

If you need a workaround you can try comment out the "traceback stuck" throw. The profile sample's stacks may be inaccurate (which probably was the case before that CL) but it probably won't crash.

lizthegrey commented 2 years ago

Need any additional debugging info? Any way that I can help?

prattmic commented 2 years ago

We have special cases for morestack and systemstack here, but I don't believe they are relevant because (a) funcID_morestack only applies to morestack, not morestack_noctxt, and (b) we should only hits these if we already switched stacks, which is not the case here.

@lizthegrey is this readily reproducible (ideally OSS, but I'm also curious if you could manually reproduce)? Or only occurs in prod?

lizthegrey commented 2 years ago

It reproduces multiple times per day in all of our environments. Sadly I'm not sure I can reduce to a minimal OSS test case.

One thing I'd be happy to do is to run a version of this that is a branch build that patches go runtime to hang rather than panic when this occurs, then I'd be happy to ssh in and dlv the binary.

gopherbot commented 2 years ago

Change https://go.dev/cl/424196 mentions this issue: DO NOT SUBMIT: runtime: add dlog to gentraceback

gopherbot commented 2 years ago

Change https://go.dev/cl/424195 mentions this issue: DO NOT SUBMIT: runtime: dump entire stack

prattmic commented 2 years ago

If you are willing to patch the runtime, there are a few changes that would be helpful.

I think the latter is a bit simpler than attaching with a debugger, since it can track iteration state (previous frames) that will be gone once we decide to crash. Plus you don't need to watch for hung tasks and manually inspect them, but that is an option as well if the patch is concerning.

[1] If adding tags to your build is difficult for some reason, you can also swap the build tag lines at the top of debuglog_off.go and debuglog_on.go.

lizthegrey commented 2 years ago

There's some slew since 1.18.5:

/usr/local/go/src/runtime/traceback.go:47:9: cannot use gp.goid (variable of type int64) as type uint64 in argument to d.u64

I'll fix the patch vs our base go version and try this overnight.

lizthegrey commented 2 years ago

Fresh crash for you with the additional telemetry.

Details ``` 2022/08/17 15:22:56 maxprocs: Updating GOMAXPROCS=4: determined from CPU quota time="2022-08-17T15:22:56Z" level=info msg="missing POD_NAME and POD_NAMESPACE in env, ok if not retriever" time="2022-08-17T15:22:56Z" level=info msg="Debug service listening on :6060" [LaunchDarkly] 2022/08/17 15:22:56 INFO: Starting LaunchDarkly client 5.0.2 [LaunchDarkly] 2022/08/17 15:22:56 INFO: Starting LaunchDarkly streaming connection [LaunchDarkly] 2022/08/17 15:22:56 INFO: Waiting up to 30000 milliseconds for LaunchDarkly client to start... [LaunchDarkly] 2022/08/17 15:22:56 INFO: Connecting to LaunchDarkly stream [LaunchDarkly] 2022/08/17 15:22:56 INFO: LaunchDarkly streaming is active [LaunchDarkly] 2022/08/17 15:22:56 INFO: Initialized LaunchDarkly client runtime: traceback stuck. pc=0x12894 sp=0x40009d7840 stack: frame={sp:0x40009d7840, fp:0x40009d7840} stack=[0x40009d6000,0x40009d8000) 0x00000040009d7740: 0x000000000001cadc 0x00000040009d77b8 0x00000040009d7750: 0x0000000000474210 0x00000040009d77c8 0x00000040009d7760: 0x0000000000000000 0x0000000000000000 0x00000040009d7770: 0x0000000000000700 0x0000000000000004 0x00000040009d7780: 0x00000040033b40c0 0x0000000000007701 0x00000040009d7790: 0x0000000000000000 0x0000000000000000 0x00000040009d77a0: 0x0000000000000000 0x0000004000600000 0x00000040009d77b0: 0x0000004002b77c00 0x00000040009d77f8 0x00000040009d77c0: 0x000000000047703c 0x0000004008ec5980 0x00000040009d77d0: 0x0000000000476f7c 0x00000040099ce05f 0x00000040009d77e0: 0x000000000010a2ee 0x0000000000000000 0x00000040009d77f0: 0x0000000000000000 0x00000040009d78c8 0x00000040009d7800: 0x00000000004777dc 0x00000040033b40c0 0x00000040009d7810: 0x0000004009ac5ab9 0x0000000000012894 0x00000040009d7820: 0x0000000000012894 0x00000040009d7898 0x00000040009d7830: 0x000128c50001ce80 0x0000000000012894 0x00000040009d7840: >0x0000000000012894 0x00000000000000dc 0x00000040009d7850: 0x0000000000000000 0x00000040009d7800 0x00000040009d7860: 0x0000000000476da4 0x00000040009d78b8 0x00000040009d7870: 0x0000000000000000 0x0000000000000000 0x00000040009d7880: 0x0000000000000000 0x0000000000000000 0x00000040009d7890: 0x0000000000000000 0x0000004009ac5ab9 0x00000040009d78a0: 0x0000004008ec5980 0x0000000000000000 0x00000040009d78b0: 0x0000000000000000 0x00000000004772f0 0x00000040009d78c0: 0x0000004008ec5980 0x00000040009d7918 0x00000040009d78d0: 0x000000000045a32c 0x0000004002cc3540 0x00000040009d78e0: 0x0000000001c75e50 0x00000040033b40c0 0x00000040009d78f0: 0x000000000045a31c 0x0000000000000018 0x00000040009d7900: 0x00000000016a8080 0x00000040005cf601 0x00000040009d7910: 0x00000040027f6300 0x00000040009d79c8 0x00000040009d7920: 0x000000000045acf8 0x000000400710dad0 0x00000040009d7930: 0x0000000001c75e50 0x00000040033b40c0 fatal error: traceback stuck goroutine 0 [idle]: runtime: unexpected return pc for runtime.sigtramp called from 0xffffaac795c0 stack: frame={sp:0x40006cacd0, fp:0x40006cada0} stack=[0x40006c4000,0x40006cc000) 0x00000040006cabd0: 0x00000040006cac28 0x0000000000000000 0x00000040006cabe0: 0x0000000000000000 0x00000040006c2000 0x00000040006cabf0: 0x0000000000000000 0x0000000000000000 0x00000040006cac00: 0x0000000000000000 0x00000040006cada0 0x00000040006cac10: 0x00000040006cae20 0x00000040006cac98 0x00000040006cac20: 0x000000000007d35c 0x000000400000001b 0x00000040006cac30: 0x00000040006c0000 0x00000040006cac58 0x00000040006cac40: 0x0000004007a6bd40 0x0000000000000000 0x00000040006cac50: 0x0000000000000000 0x0000000000000000 0x00000040006cac60: 0x0000000000000000 0x0000000000000000 0x00000040006cac70: 0x0000000000000000 0x0000000000000000 0x00000040006cac80: 0x0000004007a6bd40 0x00000040006cada0 0x00000040006cac90: 0x00000040006cae20 0x00000040006cacc8 0x00000040006caca0: 0x000000000007cc74 0x000000000000001b 0x00000040006cacb0: 0x00000040006cada0 0x00000040006cae20 0x00000040006cacc0: 0x0000000000000000 0x00000040006cbff0 0x00000040006cacd0: <0x0000ffffaac795c0 0x000000000000001b 0x00000040006cace0: 0x00000040006cada0 0x00000040006cae20 0x00000040006cacf0: 0x000000000004a4fd 0x070a091214d6eaae 0x00000040006cad00: 0x226e776f6e6b6e75 0x3464623239303924 0x00000040006cad10: 0x342d356131632d61 0x000000000000000d 0x00000040006cad20: 0x00000000000001dd 0x0000000000000000 0x00000040006cad30: 0x0000000001c5b300 0x0000004007a6bd40 0x00000040006cad40: 0x00000040006cacc8 0x3e66376972bea4d0 0x00000040006cad50: 0xbebbbd41c5d26bf1 0x3f11566aaf25de2c 0x00000040006cad60: 0xbf66c16c16bebd93 0x3fc5555555555555 0x00000040006cad70: 0x3fc7642f0ac54c41 0x3ffd137a1ea75678 0x00000040006cad80: 0xbfca94327912c722 0x0000000000000000 0x00000040006cad90: 0x0000000000000000 0x0000000000000000 0x00000040006cada0: >0x000000000000001b 0x00000000fffffffe 0x00000040006cadb0: 0x0000000000000136 0x0000000000000000 0x00000040006cadc0: 0x0000000000000000 0x0000000000000000 0x00000040006cadd0: 0x0000000000000000 0x0000000000000000 0x00000040006cade0: 0x0000000000000000 0x0000000000000000 0x00000040006cadf0: 0x0000000000000000 0x0000000000000000 0x00000040006cae00: 0x0000000000000000 0x0000000000000000 0x00000040006cae10: 0x0000000000000000 0x0000000000000000 0x00000040006cae20: 0x0000000000000000 0x0000000000000000 0x00000040006cae30: 0x00000040006c4000 0x0000000000000000 0x00000040006cae40: 0x0000000000008000 0x0000000000000000 0x00000040006cae50: 0x0000000000000000 0x0000000000000000 0x00000040006cae60: 0x0000000000000000 0x0000000000000000 0x00000040006cae70: 0x0000000000000000 0x0000000000000000 0x00000040006cae80: 0x0000000000000000 0x0000000000000000 0x00000040006cae90: 0x0000000000000000 0x0000000000000000 runtime.throw({0x1894577?, 0x2a981a0?}) /usr/local/go/src/runtime/panic.go:992 +0x50 runtime.gentraceback(0x2b3d3a0?, 0x0?, 0x40006ca968?, 0x4007a6bd40, 0x0, 0x40006ca968, 0x40, 0x0, 0x0?, 0x6) /usr/local/go/src/runtime/traceback.go:488 +0xd98 runtime.sigprof(0x79410, 0x40006cac08?, 0x0?, 0x4007a6bd40, 0x40006c0000) /usr/local/go/src/runtime/proc.go:4507 +0xf0 runtime.sighandler(0x1b?, 0x40006c0000?, 0x40006cac58?, 0x4007a6bd40?) /usr/local/go/src/runtime/signal_unix.go:613 +0x514 runtime.sigtrampgo(0x1b, 0x40006cada0, 0x40006cae20) /usr/local/go/src/runtime/signal_unix.go:477 +0x164 runtime.sigtrampgo(0x1b, 0x40006cada0, 0x40006cae20) :1 +0x1c runtime: unexpected return pc for runtime.sigtramp called from 0xffffaac795c0 stack: frame={sp:0x40006cacd0, fp:0x40006cada0} stack=[0x40006c4000,0x40006cc000) 0x00000040006cabd0: 0x00000040006cac28 0x0000000000000000 0x00000040006cabe0: 0x0000000000000000 0x00000040006c2000 0x00000040006cabf0: 0x0000000000000000 0x0000000000000000 0x00000040006cac00: 0x0000000000000000 0x00000040006cada0 0x00000040006cac10: 0x00000040006cae20 0x00000040006cac98 0x00000040006cac20: 0x000000000007d35c 0x000000400000001b 0x00000040006cac30: 0x00000040006c0000 0x00000040006cac58 0x00000040006cac40: 0x0000004007a6bd40 0x0000000000000000 0x00000040006cac50: 0x0000000000000000 0x0000000000000000 0x00000040006cac60: 0x0000000000000000 0x0000000000000000 0x00000040006cac70: 0x0000000000000000 0x0000000000000000 0x00000040006cac80: 0x0000004007a6bd40 0x00000040006cada0 0x00000040006cac90: 0x00000040006cae20 0x00000040006cacc8 0x00000040006caca0: 0x000000000007cc74 0x000000000000001b 0x00000040006cacb0: 0x00000040006cada0 0x00000040006cae20 0x00000040006cacc0: 0x0000000000000000 0x00000040006cbff0 0x00000040006cacd0: <0x0000ffffaac795c0 0x000000000000001b 0x00000040006cace0: 0x00000040006cada0 0x00000040006cae20 0x00000040006cacf0: 0x000000000004a4fd 0x070a091214d6eaae 0x00000040006cad00: 0x226e776f6e6b6e75 0x3464623239303924 0x00000040006cad10: 0x342d356131632d61 0x000000000000000d 0x00000040006cad20: 0x00000000000001dd 0x0000000000000000 0x00000040006cad30: 0x0000000001c5b300 0x0000004007a6bd40 0x00000040006cad40: 0x00000040006cacc8 0x3e66376972bea4d0 0x00000040006cad50: 0xbebbbd41c5d26bf1 0x3f11566aaf25de2c 0x00000040006cad60: 0xbf66c16c16bebd93 0x3fc5555555555555 0x00000040006cad70: 0x3fc7642f0ac54c41 0x3ffd137a1ea75678 0x00000040006cad80: 0xbfca94327912c722 0x0000000000000000 0x00000040006cad90: 0x0000000000000000 0x0000000000000000 0x00000040006cada0: >0x000000000000001b 0x00000000fffffffe 0x00000040006cadb0: 0x0000000000000136 0x0000000000000000 0x00000040006cadc0: 0x0000000000000000 0x0000000000000000 0x00000040006cadd0: 0x0000000000000000 0x0000000000000000 0x00000040006cade0: 0x0000000000000000 0x0000000000000000 0x00000040006cadf0: 0x0000000000000000 0x0000000000000000 0x00000040006cae00: 0x0000000000000000 0x0000000000000000 0x00000040006cae10: 0x0000000000000000 0x0000000000000000 0x00000040006cae20: 0x0000000000000000 0x0000000000000000 0x00000040006cae30: 0x00000040006c4000 0x0000000000000000 0x00000040006cae40: 0x0000000000008000 0x0000000000000000 0x00000040006cae50: 0x0000000000000000 0x0000000000000000 0x00000040006cae60: 0x0000000000000000 0x0000000000000000 0x00000040006cae70: 0x0000000000000000 0x0000000000000000 0x00000040006cae80: 0x0000000000000000 0x0000000000000000 0x00000040006cae90: 0x0000000000000000 0x0000000000000000 runtime.sigtramp() /usr/local/go/src/runtime/sys_linux_arm64.s:489 +0x84 goroutine 2446 [running]: ```
prattmic commented 2 years ago

There's some slew since 1.18.5:

Oops, apologies, I made this CL based on tip rather than 1.18. Did you get this fixed? If not I can provide another on 1.18.

Fresh crash for you with the additional telemetry.

Did this get built with -tags debuglog? The is missing, though perhaps just not included in your paste. The log will be output after all of the crash stack traces just before exit.

lizthegrey commented 2 years ago

Did this get built with -tags debuglog? The is missing, though perhaps just not included in your paste. The log will be output after all of the crash stack traces just before exit.

I thought it did, but apparently it's missing them. Does go install accept tags, or only go build?

                    go install -tags debuglog -ldflags "${LD_FLAGS} \
                      -X github.com/honeycombio/hound/env.buildID=$BUILD_ID \
                      -X github.com/honeycombio/hound/env.commitHash=$CIRCLE_SHA1" \
                      ./...
prattmic commented 2 years ago

Yes, go install accepts tags.

I also see that the stack dump doesn't include the entire stack either (https://go.dev/cl/424195). I'd guess that your build isn't picking up the patches to the runtime at all (e.g., wrong GOROOT), except that it clearly did when you got a build error...

You could double check the symbols in the binary. If debuglog is enabled, it should have various dlogger symbols like runtime.(*dlogger).end, if not those are missing.

lizthegrey commented 2 years ago

oh, whoops, good catch.

    Roles pinned:
        beagle: 603443
        barbet: 603443

that binary is pinned to an older build. the build instructions were correct, prod just didn't have the correct one. egg on my face, sorry, I'll re-try and let you know when we have a crash with the correct build.

prattmic commented 2 years ago

Thanks. By the way, do you happen to be using musl libc (/Alpine Linux)? (Is this even a cgo binary?)

I'm investigating #54306 in parallel, presumably an issue with musl. It is probably unrelated, but it would help to know if you are using musl.

lizthegrey commented 2 years ago

Thanks. By the way, do you happen to be using musl libc (/Alpine Linux)? (Is this even a cgo binary?)

I'm investigating #54306 in parallel, presumably an issue with musl. It is probably unrelated, but it would help to know if you are using musl.

No, we run on Ubuntu 22.04 containers with standard libc 2.35-0ubuntu3.1. No use of cgo

lizthegrey commented 2 years ago

https://gist.github.com/lizthegrey/043da22367603cd24acf82c13742d830

prattmic commented 2 years ago

Thanks, I'll take a look! Does this one have debuglog output at the end? (Lots of lines mostly containing gentrace: ...)

lizthegrey commented 2 years ago

https://gist.github.com/lizthegrey/f5827b5577abf9b0e3c883eeaf748361

lizthegrey commented 2 years ago

Oh, no, there is nothing with "gentrace" at the end.

prattmic commented 2 years ago

Hrm, that is quite frustrating.

What I can tell from the stack dump is that the SP/FP we've landed at is certainly not a valid frame pointer. It is hard to tell, but the stack doesn't look obviously corrupt to me. Perhaps the stack is OK, but we computed the wrong next frame pointer.

I'll see if I can put more information in another patch, even if debuglog fails.

prattmic commented 2 years ago

New patches, now based on the 1.18 release branch, so they should apply cleanly:

gopherbot commented 2 years ago

Change https://go.dev/cl/424077 mentions this issue: DO NOT SUBMIT: runtime: add dlog to gentraceback

gopherbot commented 2 years ago

Change https://go.dev/cl/424076 mentions this issue: DO NOT SUBMIT: runtime: dump entire stack

lizthegrey commented 2 years ago

Go ahead and base on 1.19 going forward, I just upgraded our prod environments...

This patchset worked fine though with offset:

patching file src/runtime/traceback.go
patching file src/runtime/debuglog_off.go
patching file src/runtime/debuglog_on.go
patching file src/runtime/proc.go
Hunk #1 succeeded at 4511 (offset 60 lines).
patching file src/runtime/traceback.go
lizthegrey commented 2 years ago

Still not seeing debuglog in output. Are we terminating before debuglog can be printed?

https://gist.github.com/lizthegrey/58174372484922b47c69c147a4c9f1f8

prattmic commented 2 years ago

It's possible, though I'm surprised there is no further indication of crash if true. Do you see "stack trace unavailable" at the end? Curiously, I've been using the same patch to investigate #54306 and it is working fine.

On another note, I've noticed that most of the stacks so far seem to be in code around github.com/Shopify/sarama.(*RecordBatch).decode (and various other sarama decode functions) which calls into github.com/klauspost/compress/zstd functions. Do you know (from profiles or intuition) if this code path is very hot in this application (so crashing here is likely just coincidence), or fairly rare (so perhaps not a coincidence)? There was one crash above in github.com/honeycombio/libhoney-go, so I lean towards coincidence.

cherrymui commented 2 years ago

@lizthegrey @prattmic thanks for the information! The most recent one doesn't have morestack or systemstack, so it may be not limited to that.

I'm still puzzled by the unaligned-PC-like values, e.g. 0x11b2b, which is <runtime/internal/atomic.(*Uint8).And+0x000000000000003b> (it is not only unaligned but also past the end of the function, so the execution should never get there).

It may be helpful to print the PCs it has unwound through, i.e. the content of pcbuf, so we can see how it gets there. (SPs would also be good, but they are not stored anywhere so it's hard to print. PCs plus the content of the stack is probably enough.)

@lizthegrey do you have any run without profiling turned on, and if so, is there any crash? It looks to me that the content of the stack is weird so even a non-profiling run might crash. Thanks.

prattmic commented 2 years ago

I'm still puzzled by the unaligned-PC-like values, e.g. 0x11b2b

I was confused by those as well. There are some really odd ones like runtime.debugCallWrap.func1. My guess was that since the values are fairly small, they are probably just random arguments to these functions and not PCs at all.

lizthegrey commented 2 years ago

@lizthegrey do you have any run without profiling turned on, and if so, is there any crash? It looks to me that the content of the stack is weird so even a non-profiling run might crash. Thanks.

No profiling = no crash. We've had copies of some of these binaries running without profiling for months, that have not crashed, when we enabled profiling they immediately started crashing about once per day out of 140 processes.

lizthegrey commented 2 years ago

It's possible, though I'm surprised there is no further indication of crash if true. Do you see "stack trace unavailable" at the end? Curiously, I've been using the same patch to investigate #54306 and it is working fine.

Oddly, no. What's also odd is that the kubectl log output stops after a small number of goroutines, rather than printing all the way through. I'll see if our fluentbit exporter got the entire thing and report back.

On another note, I've noticed that most of the stacks so far seem to be in code around github.com/Shopify/sarama.(*RecordBatch).decode (and various other sarama decode functions) which calls into github.com/klauspost/compress/zstd functions. Do you know (from profiles or intuition) if this code path is very hot in this application (so crashing here is likely just coincidence), or fairly rare (so perhaps not a coincidence)? There was one crash above in github.com/honeycombio/libhoney-go, so I lean towards coincidence.

It's a coincidence, sarama/zstd is about 30-40% of the workload of this app, and libhoney-go is 10%.

prattmic commented 2 years ago

What's also odd is that the kubectl log output stops after a small number of goroutines

Huh. I'd noticed that your logs stop at the second goroutine header and assumed you were just manually truncating. Is that actually where the output stops?

I've updated https://go.dev/cl/424196 to:

Could you give it a try?

lizthegrey commented 2 years ago

Huh. I'd noticed that your logs stop at the second goroutine header and assumed you were just manually truncating. Is that actually where the output stops?

It is indeed. on a recent crash, even pulling the logs from fluentbit, they're truncated directly after the second goroutine:

    /usr/local/go/src/runtime/sys_linux_arm64.s:471 +0x58 fp=0x400081cda0 sp=0x400081cce0 pc=0x80258

goroutine 469267 [running]:

and then it stops.

I've updated go.dev/cl/424196 to:

  • Print debuglog before the panic tracebacks. Maybe this will get it out?
  • Remember and dump the first 32 frames we walk (which is primarily what I want out of debuglog).
  • Dump the contents of pcbuf.

Could you give it a try?

Perfect, will give it a try.

lizthegrey commented 2 years ago

Finally got a good dump including debug logs for you. https://gist.github.com/lizthegrey/0ed6b93167f618fa0d09becf975888c4

prattmic commented 2 years ago

Awesome, thanks!

Here's what I see:

Caveat: The debuglog is still a bit corrupted, I guess because I printed early while other threads were still writing to it. That's OK though, the frame history has most of what I want.

This trace goes through libhoney.(*fieldHolder).AddField. See the source here.

previous frame 0 {pc:0x7cdc0, lr:0x24cb8, sp:0x406e1b24c0, fp:0x406e1b24c0}
previous frame 1 {pc:0x24cb8, lr:0x241a4, sp:0x406e1b24c0, fp:0x406e1b24c0}

I'm not certain what these functions are, as they don't appear in the stack dump.

Based on my own build of 1.19, mapassign_faststr+0x114 is the return site from runtime.growWork_faststr. The offset from there is 0x24cb8 yields the return site from runtime.morestack_noctxt.abi0 in runtime.evacuate_faststr. So 0x7cdc0 is probably in morestack?

previous frame 2 {pc:0x241a4, lr:0x241a4, sp:0x406e1b24c0, fp:0x406e1b2530}
previous frame 3 {pc:0x241a4, lr:0x59244c, sp:0x406e1b2530, fp:0x406e1b25a0}

0x241a4 is <runtime.mapassign_faststr+0x0000000000000114>. At the moment, I am not sure why there is an extra frame with pc == lr. It might be because 0x241a4 is actually an instruction from runtime.tophash inlined into runtime.mapassign_faststr.

previous frame 4 {pc:0x59244c, lr:0x592520, sp:0x406e1b25a0, fp:0x406e1b2600}

0x59244c is <github.com/honeycombio/libhoney-go.(*fieldHolder).AddField+0x000000000000004c>. Based on my build of libhoney, AddField+0x4c is the return site from sync.(*RWMutex).Lock, not runtime.mapassign_faststr (there is also a call to that later in the function at +0x90). This seems wrong...

previous frame 5 {pc:0x592520, lr:0x5932b0, sp:0x406e1b2600, fp:0x406e1b2600}

0x0592520 is <github.com/honeycombio/libhoney-go.(*fieldHolder).AddField.func1+0x0000000000000000>. AddField.func1 is the wrapper for defer f.lock.Unlock(). This doesn't make sense, because the wrapper does not call the main function, plus AddField.func1+0x0 is the beginning of the function and definitely not a CALL.

Presumably we get stuck here because the SP delta is 0, as SP hasn't been adjusted at function entry.

(I will note that the instruction just before this is JMP github.com/honeycombio/libhoney-go.(*fieldHolder).AddField(SB) at the end of AddField, for restarting after morestack. Still not a CALL, but a typical pc-4 would put us in a different function).

I see lots of oddities here, though I'm not sure yet exactly where the problem is.

@lizthegrey could you help double-check if my compilation of libhoney matches yours by posting the disassembly of github.com/honeycombio/libhoney-go.(*fieldHolder).AddField and github.com/honeycombio/libhoney-go.(*fieldHolder).AddField.func1 from the actual binary that crashed? (go tool objdump <binary>, look for TEXT github.com/honeycombio/libhoney-go.(*fieldHolder).AddField).

Also please send over additional crashes if you get them.

cherrymui commented 2 years ago

Thanks!

previous frame 0 {pc:0x7cdc0, lr:0x24cb8, sp:0x406e1b24c0, fp:0x406e1b24c0}
previous frame 1 {pc:0x24cb8, lr:0x241a4, sp:0x406e1b24c0, fp:0x406e1b24c0}
previous frame 2 {pc:0x241a4, lr:0x241a4, sp:0x406e1b24c0, fp:0x406e1b2530}

This is weird. It shouldn't be possible to have three frames to have the same SP. And frame 2's PC = LR, which also makes no sense. Do you know what functions are for PC 0x7cdc0 and 0x24cb8? Thanks.

It is also interesting that debuglog doesn't seem to include these frames...

lizthegrey commented 2 years ago

Additional crashes all appended to the same secret gist (see traceback{2,3,4,...})

code for Sarama and Libhoney should be available in OSS -- see https://github.com/honeycombio/libhoney-go and https://github.com/shopify/sarama

I'll post the disassembly as seen in our compiled code.

lizthegrey commented 2 years ago

in the binary (shepherd) that generated tracebacks 1-3:

TEXT github.com/honeycombio/libhoney-go.(*Client).AddField(SB) /mnt/ramdisk/go/pkg/mod/github.com/honeycombio/libhoney-go@v1.15.8/client.go
  client.go:193         0x5913e0                f9400b90                MOVD 16(R28), R16                                                               
  client.go:193         0x5913e4                eb3063ff                CMP R16, RSP                                                                    
  client.go:193         0x5913e8                54000309                BLS 24(PC)                                                                      
  client.go:193         0x5913ec                f81c0ffe                MOVD.W R30, -64(RSP)                                                            
  client.go:193         0x5913f0                f81f83fd                MOVD R29, -8(RSP)                                                               
  client.go:193         0x5913f4                d10023fd                SUB $8, RSP, R29                                                                
  client.go:193         0x5913f8                f9002be1                MOVD R1, 80(RSP)                                                                
  client.go:193         0x5913fc                f90027e0                MOVD R0, 72(RSP)                                                                
  client.go:193         0x591400                f9002fe2                MOVD R2, 88(RSP)                                                                
  client.go:193         0x591404                f90033e3                MOVD R3, 96(RSP)                                                                
  client.go:193         0x591408                f90037e4                MOVD R4, 104(RSP)                                                               
  client.go:194         0x59140c                97fffe7d                CALL github.com/honeycombio/libhoney-go.(*Client).ensureTransmission(SB)        
  client.go:195         0x591410                f94027e0                MOVD 72(RSP), R0                                                                
  client.go:195         0x591414                97fffefb                CALL github.com/honeycombio/libhoney-go.(*Client).ensureBuilder(SB)             
  client.go:196         0x591418                f94027e1                MOVD 72(RSP), R1                                                                
  client.go:196         0x59141c                f9401021                MOVD 32(R1), R1                                                                 
  client.go:196         0x591420                3980003b                MOVB (R1), R27                                                                  
  client.go:196         0x591424                9100e020                ADD $56, R1, R0                                                                 
  client.go:196         0x591428                f9402be1                MOVD 80(RSP), R1                                                                
  client.go:196         0x59142c                f9402fe2                MOVD 88(RSP), R2                                                                
  client.go:196         0x591430                f94033e3                MOVD 96(RSP), R3                                                                
  client.go:196         0x591434                f94037e4                MOVD 104(RSP), R4                                                               
  client.go:196         0x591438                940003f2                CALL github.com/honeycombio/libhoney-go.(*fieldHolder).AddField(SB)             
  client.go:197         0x59143c                a97ffbfd                LDP -8(RSP), (R29, R30)                                                         
  client.go:197         0x591440                910103ff                ADD $64, RSP, RSP                                                               
  client.go:197         0x591444                d65f03c0                RET                                                                             
  client.go:193         0x591448                f90007e0                MOVD R0, 8(RSP)                                                                 
  client.go:193         0x59144c                f9000be1                MOVD R1, 16(RSP)                                                                
  client.go:193         0x591450                f9000fe2                MOVD R2, 24(RSP)                                                                
  client.go:193         0x591454                f90013e3                MOVD R3, 32(RSP)                                                                
  client.go:193         0x591458                f90017e4                MOVD R4, 40(RSP)                                                                
  client.go:193         0x59145c                aa1e03e3                MOVD R30, R3                                                                    
  client.go:193         0x591460                97ebae58                CALL runtime.morestack_noctxt.abi0(SB)                                          
  client.go:193         0x591464                f94007e0                MOVD 8(RSP), R0                                                                 
  client.go:193         0x591468                f9400be1                MOVD 16(RSP), R1                                                                
  client.go:193         0x59146c                f9400fe2                MOVD 24(RSP), R2                                                                
  client.go:193         0x591470                f94013e3                MOVD 32(RSP), R3                                                                
  client.go:193         0x591474                f94017e4                MOVD 40(RSP), R4                                                                
  client.go:193         0x591478                17ffffda                JMP github.com/honeycombio/libhoney-go.(*Client).AddField(SB)                   
  client.go:193         0x59147c                00000000                ?                                                                               
TEXT github.com/honeycombio/libhoney-go.(*fieldHolder).AddField.func1(SB) /mnt/ramdisk/go/pkg/mod/github.com/honeycombio/libhoney-go@v1.15.8/libhoney.go
  libhoney.go:567       0x592520                f9400b90                MOVD 16(R28), R16                                                               
  libhoney.go:567       0x592524                eb3063ff                CMP R16, RSP                                                                    
  libhoney.go:567       0x592528                54000169                BLS 11(PC)                                                                      
  libhoney.go:567       0x59252c                f81e0ffe                MOVD.W R30, -32(RSP)                                                            
  libhoney.go:567       0x592530                f81f83fd                MOVD R29, -8(RSP)                                                               
  libhoney.go:567       0x592534                d10023fd                SUB $8, RSP, R29                                                                
  libhoney.go:567       0x592538                f9401390                MOVD 32(R28), R16                                                               
  libhoney.go:567       0x59253c                b5000130                CBNZ R16, 9(PC)                                                                 
  libhoney.go:567       0x592540                f9400740                MOVD 8(R26), R0                                                                 
  libhoney.go:567       0x592544                97ebdf5b                CALL sync.(*RWMutex).Unlock(SB)                                                 
  libhoney.go:567       0x592548                a97ffbfd                LDP -8(RSP), (R29, R30)                                                         
  libhoney.go:567       0x59254c                910083ff                ADD $32, RSP, RSP                                                               
  libhoney.go:567       0x592550                d65f03c0                RET                                                                             
  libhoney.go:567       0x592554                aa1e03e3                MOVD R30, R3                                                                    
  libhoney.go:567       0x592558                97eba9fa                CALL runtime.morestack.abi0(SB)                                                 
  libhoney.go:567       0x59255c                17fffff1                JMP github.com/honeycombio/libhoney-go.(*fieldHolder).AddField.func1(SB)        
  libhoney.go:567       0x592560                f9400211                MOVD (R16), R17                                                                 
  libhoney.go:567       0x592564                9100a3f4                ADD $40, RSP, R20                                                               
  libhoney.go:567       0x592568                eb11029f                CMP R17, R20                                                                    
  libhoney.go:567       0x59256c                54fffea1                BNE -11(PC)                                                                     
  libhoney.go:567       0x592570                910023f4                ADD $8, RSP, R20                                                                
  libhoney.go:567       0x592574                f9000214                MOVD R20, (R16)                                                                 
  libhoney.go:567       0x592578                17fffff2                JMP -14(PC)                                                                     
  libhoney.go:567       0x59257c                00000000                ?                                                                               

traceback 4 is from a different binary, let me know what disasm you need from that.

lizthegrey commented 2 years ago

0x7cdc0

TEXT github.com/gogo/protobuf/proto.makeTimePtrSliceMarshaler.func2(SB) /mnt/ramdisk/go/pkg/mod/github.com/gogo/protobuf@v1.3.2/proto/table_marshal_gogo.go
[...]
value.go:1616                 0x7cdbf8                f1005cff                CMP $23, R7                                                             
value.go:1616                 0x7cdbfc                54000061                BNE 3(PC)                                                               
value.go:1617                 0x7cdc00                f9400427                MOVD 8(R1), R7                                                          
table_marshal_gogo.go:257     0x7cdc04                1400000a                JMP 10(PC)                                                              
value.go:1619                 0x7cdc08                97e41bae                CALL reflect.Value.lenNonSlice(SB)                                      
table_marshal_gogo.go:258     0x7cdc0c                f9404be1                MOVD 144(RSP), R1        

0x24cb8

TEXT hash/crc32.MakeTable(SB) /usr/local/go/src/hash/crc32/crc32.go
[...]
crc32_generic.go:29   0x24cb80                14000005                JMP 5(PC)                               
crc32.go:137          0x24cb84                a97ffbfd                LDP -8(RSP), (R29, R30)                 
crc32.go:137          0x24cb88                9100c3ff                ADD $48, RSP, RSP                       
crc32.go:137          0x24cb8c                d65f03c0                RET                   
lizthegrey commented 2 years ago

libhoney-go

I've identified why your build output for libhoney-go may be different from mine. this repo is lagging the mainline a bit.

go 1.18

require (
[...]
        github.com/honeycombio/libhoney-go v1.15.8
        github.com/Shopify/sarama v1.31.0
prattmic commented 2 years ago
TEXT github.com/honeycombio/libhoney-go.(*Client).AddField(SB) 

This is Client.AddField. Could you post fieldHolder.AddField?

Also for 0x7cdc0, it looks like you posted 0x7cdc00 instead (off by an order of magnitude). Same with 0x24cb8.

cherrymui commented 2 years ago

So 0x7cdc0 is probably in morestack

If this is in morestack, I think that makes sense. morestack itself doesn't have a frame, so that's "frame 0" above. Its caller (evacuate_faststr) hasn't created a frame yet, so it also doesn't have a frame, which is "frame 1" above.

Frame 1 should have its return address in growWork_faststr, but somehow that is missing, and directly skip to mapassign_faststr?

My guess is that a profiling signal lands in morestack but before the stack switch. The jumpstack logic doesn't kick in as it hasn't switched. At this point PC=morestack, LR=caller of morestack, i.e. evacuate_faststr. And it unwinds a frame normally.

At the second frame, nothing sets the LR specially, so at this point https://cs.opensource.google/go/go/+/master:src/runtime/traceback.go;l=228 it just loads the LR from the stack. But evacuate_faststr hasn't actually saved LR on stack, so it loads the LR of its caller, which is the LR of growWork_faststr, which ismapassgn_faststr`. I think that explains the missing frame. I think the logic of the code assumes only the last frame can be frameless (the n==0 check), all following frames have LR saved on stack, so it assumes it is safe to load. But the morestack-but-havevn't-switched case breaks that logic...

cherrymui commented 2 years ago

morestack writes SP, so it should have SPWRITE bit set. If the jump stack logic doesn't kick in, it should stop here https://cs.opensource.google/go/go/+/master:src/runtime/traceback.go;l=201 . Unfortunate for profiling, but doesn't crash.

However, morestack_noctxt doesn't write SP, so if a signal lands there, it tries to unwind normally... Maybe a stop-gap fix is to mark morestack_noctxt SPWRITE...