What operating system and processor architecture are you using (go env)?
Reported on linux/amd64 systems. Reported on both Go 1.20.x and 1.21.x builds.
We have limited access to customer systems.
What did you do?
We have at MinIO been experiencing runtime crashes since the release of Go 1.20
The issues only appear to occur for a very small number of our customers, and supplying them with a Go 1.19.x compiled binary always solves the issue.
The issue appear slightly different each time, but all indicate some sort of corruption. Since none of them had any "smoking gun", I held off on submitting an issue.
Here are some
samples of customer crashes (click to expand)
```
May 8 04:11:36 framin107 minio[1082653]: fatal error: unexpected signal during runtime execution
May 8 04:11:36 framin107 minio[1082653]: panic during panic
May 8 04:11:36 framin107 minio[1082653]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x0]
May 8 04:11:36 framin107 minio[1082653]: runtime stack:
May 8 04:11:36 framin107 minio[1082653]: runtime.throw({0x2b40d78?, 0x0?})
May 8 04:11:36 framin107 minio[1082653]: #011runtime/panic.go:1047 +0x5d fp=0xc0102f1ed8 sp=0xc0102f1ea8 pc=0x43933d
May 8 04:11:36 framin107 minio[1082653]: runtime: g 0: unexpected return pc for runtime.sigpanic called from 0x0
```
.. Same hardware:
```
May 10 10:17:18 framin107 minio[1097631]: fatal error: bulkBarrierPreWrite: unaligned arguments
May 10 10:17:18 framin107 minio[1097631]: unexpected fault address 0x0
May 10 10:17:18 framin107 minio[1097631]: fatal error: fault
May 10 10:17:18 framin107 minio[1097631]: goroutine 50957332 [running]:
May 10 10:17:18 framin107 minio[1097631]: runtime: g 50957332: unexpected return pc for runtime.throw called from 0x0
May 10 10:17:18 framin107 minio[1097631]: stack: frame={sp:0xc01034b1b8, fp:0xc01034b1e8} stack=[0xc01034a000,0xc01034c000)
```
```
May 10 03:53:32 framin107 minio[1090219]: fatal error: index out of range
May 10 03:53:32 framin107 minio[1090219]: fatal error: index out of range
May 10 03:53:32 framin107 minio[1090219]: fatal error: index out of range
May 10 03:53:32 framin107 minio[1090219]: fatal: bad g in signal handler
May 10 03:53:33 framin107 systemd[1]: minio.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
```
Second customer:
```
Nov 27 02:36:00 minio minio[240609]: fatal error: unexpected signal during runtime execution
Nov 27 02:36:00 minio minio[240609]: unexpected fault address 0x0
Nov 27 02:36:00 minio minio[240609]: fatal error: fault
Nov 27 02:36:00 minio minio[240609]: fatal error: unexpected signal during runtime execution
Nov 27 02:36:00 minio minio[240609]: fatal error: bulkBarrierPreWrite: unaligned arguments
Nov 27 02:36:00 minio minio[240609]: runtime: pointer 0xc034ec65b0 to unallocated span span.base()=0xc034ec6000 span.limit=0xc034ec7e40 span.state=0
Nov 27 02:36:00 minio minio[240609]: runtime: found in object at *(0xc03939c580+0x28)
Nov 27 02:36:00 minio minio[240609]: object=0xc03939c580 s.base()=0xc03939c000 s.limit=0xc03939de40 s.spanclass=58 s.elemsize=704 s.state=mSpanInUse
Nov 27 02:36:00 minio minio[240609]: *(object+0) = 0x4baf5a0
Nov 27 02:36:00 minio minio[240609]: *(object+8) = 0xc01ec44a80
Nov 27 02:36:00 minio minio[240609]: *(object+16) = 0x4baf5a0
Nov 27 02:36:00 minio minio[240609]: *(object+24) = 0xc027728000
Nov 27 02:36:00 minio minio[240609]: *(object+32) = 0x732280
Nov 27 02:36:00 minio minio[240609]: *(object+40) = 0xc034ec65b0 <==
Nov 27 02:36:00 minio minio[240609]: *(object+48) = 0xc03e43b7a0
Nov 27 02:36:00 minio minio[240609]: *(object+56) = 0xc022b526c0
Nov 27 02:36:00 minio minio[240609]: *(object+64) = 0x4b9e4f8
Nov 27 02:36:00 minio minio[240609]: *(object+72) = 0xc02f21a2c0
Nov 27 02:36:00 minio minio[240609]: *(object+80) = 0x0
Nov 27 02:36:00 minio minio[240609]: *(object+88) = 0x0
Nov 27 02:36:00 minio minio[240609]: *(object+96) = 0x23eafe0
Nov 27 02:36:00 minio minio[240609]: *(object+104) = 0xc035cb8c14
Nov 27 02:36:00 minio minio[240609]: *(object+112) = 0xc00147c000
Nov 27 02:36:00 minio minio[240609]: *(object+120) = 0xc02f21a340
Nov 27 02:36:00 minio minio[240609]: *(object+128) = 0x4b9df08
Nov 27 02:36:00 minio minio[240609]: *(object+136) = 0x62aa480
Nov 27 02:36:00 minio minio[240609]: *(object+144) = 0x1
Nov 27 02:36:00 minio minio[240609]: *(object+152) = 0x0
Nov 27 02:36:00 minio minio[240609]: *(object+160) = 0xc035c885b8
Nov 27 02:36:00 minio minio[240609]: *(object+168) = 0x4
Nov 27 02:36:00 minio minio[240609]: *(object+176) = 0xc035c885bd
Nov 27 02:36:00 minio minio[240609]: *(object+184) = 0xd
Nov 27 02:36:00 minio minio[240609]: *(object+192) = 0x4baf5a0
Nov 27 02:36:00 minio minio[240609]: *(object+200) = 0xc01ec44a80
Nov 27 02:36:00 minio minio[240609]: *(object+208) = 0x4baf5a0
Nov 27 02:36:00 minio minio[240609]: *(object+216) = 0xc0064520e0
Nov 27 02:36:00 minio minio[240609]: *(object+224) = 0xc030055712
Nov 27 02:36:00 minio minio[240609]: *(object+232) = 0x16
Nov 27 02:36:00 minio minio[240609]: *(object+240) = 0xc035c885b8
Nov 27 02:36:00 minio minio[240609]: *(object+248) = 0x12
Nov 27 02:36:00 minio minio[240609]: *(object+256) = 0x23eafe0
Nov 27 02:36:00 minio minio[240609]: *(object+264) = 0xc035cb8e40
Nov 27 02:36:00 minio minio[240609]: *(object+272) = 0xc00137bb80
Nov 27 02:36:00 minio minio[240609]: *(object+280) = 0xc02f21a480
Nov 27 02:36:00 minio minio[240609]: *(object+288) = 0xc035c88618
Nov 27 02:36:00 minio minio[240609]: *(object+296) = 0x4
Nov 27 02:36:00 minio minio[240609]: *(object+304) = 0xc035c8861d
Nov 27 02:36:00 minio minio[240609]: *(object+312) = 0xd
Nov 27 02:36:00 minio minio[240609]: *(object+320) = 0x4baf5a0
Nov 27 02:36:00 minio minio[240609]: *(object+328) = 0xc01ec44a80
Nov 27 02:36:00 minio minio[240609]: *(object+336) = 0x4baf5a0
Nov 27 02:36:00 minio minio[240609]: *(object+344) = 0xc0064520e0
Nov 27 02:36:00 minio minio[240609]: *(object+352) = 0xc0300557d3
Nov 27 02:36:00 minio minio[240609]: *(object+360) = 0x15
Nov 27 02:36:00 minio minio[240609]: *(object+368) = 0xc035c88618
Nov 27 02:36:00 minio minio[240609]: *(object+376) = 0x12
Nov 27 02:36:00 minio minio[240609]: *(object+384) = 0x23eafe0
Nov 27 02:36:00 minio minio[240609]: *(object+392) = 0xc035cb8f30
Nov 27 02:36:00 minio minio[240609]: *(object+400) = 0xc001e26780
Nov 27 02:36:00 minio minio[240609]: *(object+408) = 0xc02f21a580
Nov 27 02:36:00 minio minio[240609]: *(object+416) = 0x0
Nov 27 02:36:00 minio minio[240609]: *(object+424) = 0x0
Nov 27 02:36:00 minio minio[240609]: *(object+432) = 0xc035ce9320
Nov 27 02:36:00 minio minio[240609]: *(object+440) = 0xc035ce9330
Nov 27 02:36:00 minio minio[240609]: *(object+448) = 0x0
Nov 27 02:36:00 minio minio[240609]: *(object+456) = 0x0
Nov 27 02:36:00 minio minio[240609]: *(object+464) = 0xc035ce93f0
Nov 27 02:36:00 minio minio[240609]: *(object+472) = 0xc035ce9400
Nov 27 02:36:00 minio minio[240609]: *(object+480) = 0x0
Nov 27 02:36:00 minio minio[240609]: *(object+488) = 0x0
Nov 27 02:36:00 minio minio[240609]: *(object+496) = 0xc035ce94c0
Nov 27 02:36:00 minio minio[240609]: *(object+504) = 0xc035ce94d0
Nov 27 02:36:00 minio minio[240609]: *(object+512) = 0x4b9df08
Nov 27 02:36:00 minio minio[240609]: *(object+520) = 0x62aa480
Nov 27 02:36:00 minio minio[240609]: *(object+528) = 0x0
Nov 27 02:36:00 minio minio[240609]: *(object+536) = 0x0
Nov 27 02:36:00 minio minio[240609]: *(object+544) = 0xc035c887f8
Nov 27 02:36:00 minio minio[240609]: *(object+552) = 0x4
Nov 27 02:36:00 minio minio[240609]: *(object+560) = 0xc035c887fd
Nov 27 02:36:00 minio minio[240609]: *(object+568) = 0xd
Nov 27 02:36:00 minio minio[240609]: *(object+576) = 0xc035cb90f0
Nov 27 02:36:00 minio minio[240609]: *(object+584) = 0xa
Nov 27 02:36:00 minio minio[240609]: *(object+592) = 0xc035cb90fb
Nov 27 02:36:00 minio minio[240609]: *(object+600) = 0x3
Nov 27 02:36:00 minio minio[240609]: *(object+608) = 0x4baf5a0
Nov 27 02:36:00 minio minio[240609]: *(object+616) = 0xc01ec44a80
Nov 27 02:36:00 minio minio[240609]: *(object+624) = 0x4baf5a0
Nov 27 02:36:00 minio minio[240609]: *(object+632) = 0xc027728000
Nov 27 02:36:00 minio minio[240609]: *(object+640) = 0x23ae600
Nov 27 02:36:00 minio minio[240609]: *(object+648) = 0x4ba47f0
Nov 27 02:36:00 minio minio[240609]: *(object+656) = 0xc034ecedc0
Nov 27 02:36:00 minio minio[240609]: *(object+664) = 0xc018072b90
Nov 27 02:36:00 minio minio[240609]: *(object+672) = 0x23eafe0
Nov 27 02:36:00 minio minio[240609]: *(object+680) = 0xc035cb9150
Nov 27 02:36:00 minio minio[240609]: *(object+688) = 0xc001599180
Nov 27 02:36:00 minio minio[240609]: *(object+696) = 0xc02f21a800
Nov 27 02:36:00 minio minio[240609]: fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?)
```
Third customer:
```
Aug 10 12:02:06 minio minio[13162]: fatal error: unexpected signal during runtime execution
Aug 10 12:02:06 minio minio[13162]: unexpected fault address 0x0
Aug 10 12:02:06 minio minio[13162]: fatal error: fault
Aug 10 12:02:06 minio minio[13162]: runtime: pointer 0xc0228c4a20 to unused region of span span.base()=0xc0141cc000 span.limit=0xc014Aug 10 12:02:06 minio minio[13162]: runtime: found in object at *(0xc020089980+0x10)
Aug 10 12:02:06 minio minio[13162]: object=0xc020089980 s.base()=0xc020088000 s.limit=0xc020089fe0 s.spanclass=16 s.elemsize=96 s.staAug 10 12:02:06 minio minio[13162]: *(object+0) = 0x2b3b90b
Aug 10 12:02:06 minio minio[13162]: *(object+8) = 0x4
Aug 10 12:02:06 minio minio[13162]: *(object+16) = 0xc0228c4a20 <==
Aug 10 12:02:06 minio minio[13162]: *(object+24) = 0x8b
Aug 10 12:02:06 minio minio[13162]: *(object+32) = 0x4d3c400
Aug 10 12:02:06 minio minio[13162]: *(object+40) = 0x61953a0
Aug 10 12:02:06 minio minio[13162]: *(object+48) = 0x2309720
Aug 10 12:02:06 minio minio[13162]: *(object+56) = 0xc12d4b44a6b3921a
Aug 10 12:02:06 minio minio[13162]: *(object+64) = 0xafc4c36fd829a
Aug 10 12:02:06 minio minio[13162]: *(object+72) = 0x6301700
Aug 10 12:02:06 minio minio[13162]: *(object+80) = 0x6380620
Aug 10 12:02:06 minio minio[13162]: *(object+88) = 0x5
Aug 10 12:02:06 minio minio[13162]: fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?)
```
It does however seem like the "Thanos" project has supplied what looks to be smoking gun. (issue contains additional traces)
The only really interesting thing that goes on is that calls assembler to compress the block. Looking at the assembly caller
it is pretty straightforward. This is the disassembly for the caller:
// func encodeBetterBlockAsm(dst []byte, src []byte) int
// Requires: BMI, SSE2
TEXT ·encodeBetterBlockAsm(SB), $589848-56
... and the function definition:
//go:noescape
func encodeBetterBlockAsm(dst []byte, src []byte) int
The reason I included the stack size check is that it uses quite a bit of stack, so there is a chance that it is called.
The stack is used for a dynamic lookup table. I am fairly sure there are no writes outside the stack, and I also am pretty confident there are no writes outside the provided slices (this would likely also give different errors).
I do not use the BP register - so it is not clobbered, and only SSE2 registers are used - so no VZEROUPPER weirdness. The stack is managed by avo, so less likely there is a bug with that.
So my questions are:
A) Am I doing something obviously wrong?
B) What would be a typical reason for this error to show up?
C) This seems releated to GC, so is there a window where the goroutine could be preempted in an unsafe state?
D) Are there any Go 1.20 changes that seem likely to be triggering this?
Keep in mind that this doesn't appear to happen on too many machines. Talos reported that it seem to happen more often if a lot of memory is allocated.
I will of course assist with any information that may be needed - but I feel at this point I need some pointers from people deeper understanding of the runtime to get much further.
Also note we have tested CPU+RAM on some of these customer systems extensively since that seemed like a possibility at first. Also note that crashes can be completely unrelated - but the coincidence seems to big.
Go version
Go 1.20.x and later
What operating system and processor architecture are you using (
go env
)?What did you do?
We have at MinIO been experiencing runtime crashes since the release of Go 1.20
The issues only appear to occur for a very small number of our customers, and supplying them with a Go 1.19.x compiled binary always solves the issue.
The issue appear slightly different each time, but all indicate some sort of corruption. Since none of them had any "smoking gun", I held off on submitting an issue.
Here are some
samples of customer crashes (click to expand)
``` May 8 04:11:36 framin107 minio[1082653]: fatal error: unexpected signal during runtime execution May 8 04:11:36 framin107 minio[1082653]: panic during panic May 8 04:11:36 framin107 minio[1082653]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x0] May 8 04:11:36 framin107 minio[1082653]: runtime stack: May 8 04:11:36 framin107 minio[1082653]: runtime.throw({0x2b40d78?, 0x0?}) May 8 04:11:36 framin107 minio[1082653]: #011runtime/panic.go:1047 +0x5d fp=0xc0102f1ed8 sp=0xc0102f1ea8 pc=0x43933d May 8 04:11:36 framin107 minio[1082653]: runtime: g 0: unexpected return pc for runtime.sigpanic called from 0x0 ``` .. Same hardware: ``` May 10 10:17:18 framin107 minio[1097631]: fatal error: bulkBarrierPreWrite: unaligned arguments May 10 10:17:18 framin107 minio[1097631]: unexpected fault address 0x0 May 10 10:17:18 framin107 minio[1097631]: fatal error: fault May 10 10:17:18 framin107 minio[1097631]: goroutine 50957332 [running]: May 10 10:17:18 framin107 minio[1097631]: runtime: g 50957332: unexpected return pc for runtime.throw called from 0x0 May 10 10:17:18 framin107 minio[1097631]: stack: frame={sp:0xc01034b1b8, fp:0xc01034b1e8} stack=[0xc01034a000,0xc01034c000) ``` ``` May 10 03:53:32 framin107 minio[1090219]: fatal error: index out of range May 10 03:53:32 framin107 minio[1090219]: fatal error: index out of range May 10 03:53:32 framin107 minio[1090219]: fatal error: index out of range May 10 03:53:32 framin107 minio[1090219]: fatal: bad g in signal handler May 10 03:53:33 framin107 systemd[1]: minio.service: Main process exited, code=exited, status=2/INVALIDARGUMENT ``` Second customer: ``` Nov 27 02:36:00 minio minio[240609]: fatal error: unexpected signal during runtime execution Nov 27 02:36:00 minio minio[240609]: unexpected fault address 0x0 Nov 27 02:36:00 minio minio[240609]: fatal error: fault Nov 27 02:36:00 minio minio[240609]: fatal error: unexpected signal during runtime execution Nov 27 02:36:00 minio minio[240609]: fatal error: bulkBarrierPreWrite: unaligned arguments Nov 27 02:36:00 minio minio[240609]: runtime: pointer 0xc034ec65b0 to unallocated span span.base()=0xc034ec6000 span.limit=0xc034ec7e40 span.state=0 Nov 27 02:36:00 minio minio[240609]: runtime: found in object at *(0xc03939c580+0x28) Nov 27 02:36:00 minio minio[240609]: object=0xc03939c580 s.base()=0xc03939c000 s.limit=0xc03939de40 s.spanclass=58 s.elemsize=704 s.state=mSpanInUse Nov 27 02:36:00 minio minio[240609]: *(object+0) = 0x4baf5a0 Nov 27 02:36:00 minio minio[240609]: *(object+8) = 0xc01ec44a80 Nov 27 02:36:00 minio minio[240609]: *(object+16) = 0x4baf5a0 Nov 27 02:36:00 minio minio[240609]: *(object+24) = 0xc027728000 Nov 27 02:36:00 minio minio[240609]: *(object+32) = 0x732280 Nov 27 02:36:00 minio minio[240609]: *(object+40) = 0xc034ec65b0 <== Nov 27 02:36:00 minio minio[240609]: *(object+48) = 0xc03e43b7a0 Nov 27 02:36:00 minio minio[240609]: *(object+56) = 0xc022b526c0 Nov 27 02:36:00 minio minio[240609]: *(object+64) = 0x4b9e4f8 Nov 27 02:36:00 minio minio[240609]: *(object+72) = 0xc02f21a2c0 Nov 27 02:36:00 minio minio[240609]: *(object+80) = 0x0 Nov 27 02:36:00 minio minio[240609]: *(object+88) = 0x0 Nov 27 02:36:00 minio minio[240609]: *(object+96) = 0x23eafe0 Nov 27 02:36:00 minio minio[240609]: *(object+104) = 0xc035cb8c14 Nov 27 02:36:00 minio minio[240609]: *(object+112) = 0xc00147c000 Nov 27 02:36:00 minio minio[240609]: *(object+120) = 0xc02f21a340 Nov 27 02:36:00 minio minio[240609]: *(object+128) = 0x4b9df08 Nov 27 02:36:00 minio minio[240609]: *(object+136) = 0x62aa480 Nov 27 02:36:00 minio minio[240609]: *(object+144) = 0x1 Nov 27 02:36:00 minio minio[240609]: *(object+152) = 0x0 Nov 27 02:36:00 minio minio[240609]: *(object+160) = 0xc035c885b8 Nov 27 02:36:00 minio minio[240609]: *(object+168) = 0x4 Nov 27 02:36:00 minio minio[240609]: *(object+176) = 0xc035c885bd Nov 27 02:36:00 minio minio[240609]: *(object+184) = 0xd Nov 27 02:36:00 minio minio[240609]: *(object+192) = 0x4baf5a0 Nov 27 02:36:00 minio minio[240609]: *(object+200) = 0xc01ec44a80 Nov 27 02:36:00 minio minio[240609]: *(object+208) = 0x4baf5a0 Nov 27 02:36:00 minio minio[240609]: *(object+216) = 0xc0064520e0 Nov 27 02:36:00 minio minio[240609]: *(object+224) = 0xc030055712 Nov 27 02:36:00 minio minio[240609]: *(object+232) = 0x16 Nov 27 02:36:00 minio minio[240609]: *(object+240) = 0xc035c885b8 Nov 27 02:36:00 minio minio[240609]: *(object+248) = 0x12 Nov 27 02:36:00 minio minio[240609]: *(object+256) = 0x23eafe0 Nov 27 02:36:00 minio minio[240609]: *(object+264) = 0xc035cb8e40 Nov 27 02:36:00 minio minio[240609]: *(object+272) = 0xc00137bb80 Nov 27 02:36:00 minio minio[240609]: *(object+280) = 0xc02f21a480 Nov 27 02:36:00 minio minio[240609]: *(object+288) = 0xc035c88618 Nov 27 02:36:00 minio minio[240609]: *(object+296) = 0x4 Nov 27 02:36:00 minio minio[240609]: *(object+304) = 0xc035c8861d Nov 27 02:36:00 minio minio[240609]: *(object+312) = 0xd Nov 27 02:36:00 minio minio[240609]: *(object+320) = 0x4baf5a0 Nov 27 02:36:00 minio minio[240609]: *(object+328) = 0xc01ec44a80 Nov 27 02:36:00 minio minio[240609]: *(object+336) = 0x4baf5a0 Nov 27 02:36:00 minio minio[240609]: *(object+344) = 0xc0064520e0 Nov 27 02:36:00 minio minio[240609]: *(object+352) = 0xc0300557d3 Nov 27 02:36:00 minio minio[240609]: *(object+360) = 0x15 Nov 27 02:36:00 minio minio[240609]: *(object+368) = 0xc035c88618 Nov 27 02:36:00 minio minio[240609]: *(object+376) = 0x12 Nov 27 02:36:00 minio minio[240609]: *(object+384) = 0x23eafe0 Nov 27 02:36:00 minio minio[240609]: *(object+392) = 0xc035cb8f30 Nov 27 02:36:00 minio minio[240609]: *(object+400) = 0xc001e26780 Nov 27 02:36:00 minio minio[240609]: *(object+408) = 0xc02f21a580 Nov 27 02:36:00 minio minio[240609]: *(object+416) = 0x0 Nov 27 02:36:00 minio minio[240609]: *(object+424) = 0x0 Nov 27 02:36:00 minio minio[240609]: *(object+432) = 0xc035ce9320 Nov 27 02:36:00 minio minio[240609]: *(object+440) = 0xc035ce9330 Nov 27 02:36:00 minio minio[240609]: *(object+448) = 0x0 Nov 27 02:36:00 minio minio[240609]: *(object+456) = 0x0 Nov 27 02:36:00 minio minio[240609]: *(object+464) = 0xc035ce93f0 Nov 27 02:36:00 minio minio[240609]: *(object+472) = 0xc035ce9400 Nov 27 02:36:00 minio minio[240609]: *(object+480) = 0x0 Nov 27 02:36:00 minio minio[240609]: *(object+488) = 0x0 Nov 27 02:36:00 minio minio[240609]: *(object+496) = 0xc035ce94c0 Nov 27 02:36:00 minio minio[240609]: *(object+504) = 0xc035ce94d0 Nov 27 02:36:00 minio minio[240609]: *(object+512) = 0x4b9df08 Nov 27 02:36:00 minio minio[240609]: *(object+520) = 0x62aa480 Nov 27 02:36:00 minio minio[240609]: *(object+528) = 0x0 Nov 27 02:36:00 minio minio[240609]: *(object+536) = 0x0 Nov 27 02:36:00 minio minio[240609]: *(object+544) = 0xc035c887f8 Nov 27 02:36:00 minio minio[240609]: *(object+552) = 0x4 Nov 27 02:36:00 minio minio[240609]: *(object+560) = 0xc035c887fd Nov 27 02:36:00 minio minio[240609]: *(object+568) = 0xd Nov 27 02:36:00 minio minio[240609]: *(object+576) = 0xc035cb90f0 Nov 27 02:36:00 minio minio[240609]: *(object+584) = 0xa Nov 27 02:36:00 minio minio[240609]: *(object+592) = 0xc035cb90fb Nov 27 02:36:00 minio minio[240609]: *(object+600) = 0x3 Nov 27 02:36:00 minio minio[240609]: *(object+608) = 0x4baf5a0 Nov 27 02:36:00 minio minio[240609]: *(object+616) = 0xc01ec44a80 Nov 27 02:36:00 minio minio[240609]: *(object+624) = 0x4baf5a0 Nov 27 02:36:00 minio minio[240609]: *(object+632) = 0xc027728000 Nov 27 02:36:00 minio minio[240609]: *(object+640) = 0x23ae600 Nov 27 02:36:00 minio minio[240609]: *(object+648) = 0x4ba47f0 Nov 27 02:36:00 minio minio[240609]: *(object+656) = 0xc034ecedc0 Nov 27 02:36:00 minio minio[240609]: *(object+664) = 0xc018072b90 Nov 27 02:36:00 minio minio[240609]: *(object+672) = 0x23eafe0 Nov 27 02:36:00 minio minio[240609]: *(object+680) = 0xc035cb9150 Nov 27 02:36:00 minio minio[240609]: *(object+688) = 0xc001599180 Nov 27 02:36:00 minio minio[240609]: *(object+696) = 0xc02f21a800 Nov 27 02:36:00 minio minio[240609]: fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?) ``` Third customer: ``` Aug 10 12:02:06 minio minio[13162]: fatal error: unexpected signal during runtime execution Aug 10 12:02:06 minio minio[13162]: unexpected fault address 0x0 Aug 10 12:02:06 minio minio[13162]: fatal error: fault Aug 10 12:02:06 minio minio[13162]: runtime: pointer 0xc0228c4a20 to unused region of span span.base()=0xc0141cc000 span.limit=0xc014Aug 10 12:02:06 minio minio[13162]: runtime: found in object at *(0xc020089980+0x10) Aug 10 12:02:06 minio minio[13162]: object=0xc020089980 s.base()=0xc020088000 s.limit=0xc020089fe0 s.spanclass=16 s.elemsize=96 s.staAug 10 12:02:06 minio minio[13162]: *(object+0) = 0x2b3b90b Aug 10 12:02:06 minio minio[13162]: *(object+8) = 0x4 Aug 10 12:02:06 minio minio[13162]: *(object+16) = 0xc0228c4a20 <== Aug 10 12:02:06 minio minio[13162]: *(object+24) = 0x8b Aug 10 12:02:06 minio minio[13162]: *(object+32) = 0x4d3c400 Aug 10 12:02:06 minio minio[13162]: *(object+40) = 0x61953a0 Aug 10 12:02:06 minio minio[13162]: *(object+48) = 0x2309720 Aug 10 12:02:06 minio minio[13162]: *(object+56) = 0xc12d4b44a6b3921a Aug 10 12:02:06 minio minio[13162]: *(object+64) = 0xafc4c36fd829a Aug 10 12:02:06 minio minio[13162]: *(object+72) = 0x6301700 Aug 10 12:02:06 minio minio[13162]: *(object+80) = 0x6380620 Aug 10 12:02:06 minio minio[13162]: *(object+88) = 0x5 Aug 10 12:02:06 minio minio[13162]: fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?) ```It does however seem like the "Thanos" project has supplied what looks to be smoking gun. (issue contains additional traces)
So the crash occurs in a goroutine that compresses a block.
The only really interesting thing that goes on is that calls assembler to compress the block. Looking at the assembly caller it is pretty straightforward. This is the disassembly for the caller:
This is the assembly function called signature:
... and the function definition:
The reason I included the stack size check is that it uses quite a bit of stack, so there is a chance that it is called.
The stack is used for a dynamic lookup table. I am fairly sure there are no writes outside the stack, and I also am pretty confident there are no writes outside the provided slices (this would likely also give different errors).
I do not use the
BP
register - so it is not clobbered, and only SSE2 registers are used - so noVZEROUPPER
weirdness. The stack is managed by avo, so less likely there is a bug with that.So my questions are:
A) Am I doing something obviously wrong? B) What would be a typical reason for this error to show up? C) This seems releated to GC, so is there a window where the goroutine could be preempted in an unsafe state? D) Are there any Go 1.20 changes that seem likely to be triggering this?
Keep in mind that this doesn't appear to happen on too many machines. Talos reported that it seem to happen more often if a lot of memory is allocated.
I will of course assist with any information that may be needed - but I feel at this point I need some pointers from people deeper understanding of the runtime to get much further.
Also note we have tested CPU+RAM on some of these customer systems extensively since that seemed like a possibility at first. Also note that crashes can be completely unrelated - but the coincidence seems to big.
What did you expect to see?
No crash
What did you see instead?
Rare, random runtime crashes
Edit: Assembly is now linux compiled.