Closed cockroach-teamcity closed 2 weeks ago
This looks like a seg fault in the runtime? i quickly looked at the stack dump and unable to follow the seg fault to crdb code.
Some notes:
unsafe
call. According to deps.bazel, we still use cgo and and the unsafe pkg is littered throughout the repo.I doubt this has to do with backupccl code:
backupccl
=== RUN TestRestoreDatabaseVersusTable
test_log_scope.go:165: test logs captured to: /artifacts/tmp/_tmp/47447a7ed84475b6aaa4b9399a882ce0/logTestRestoreDatabaseVersusTable2978839770
test_log_scope.go:76: use -show-logs to present logs inline
test_server_shim.go:152: automatically injected a shared process virtual cluster under test; see comment at top of test_server_shim.go for details.
=== RUN TestRestoreDatabaseVersusTable/incomplete-db
test_server_shim.go:152: automatically injected a shared process virtual cluster under test; see comment at top of test_server_shim.go for details.
SIGSEGV: segmentation violation
PC=0x42b81c m=19 sigcode=1 addr=0x20
goroutine 0 gp=0x400c2121c0 m=19 mp=0x400c210008 [idle]:
runtime.(*mspan).typePointersOfUnchecked(0x40168850e0?, 0x4015086c00?)
GOROOT/src/runtime/mbitmap_allocheaders.go:202 +0x3c fp=0xffff4f3fccd0 sp=0xffff4f3fccb0 pc=0x42b81c
runtime.scanobject(0x400c792000, 0x40000dc168)
GOROOT/src/runtime/mgcmark.go:1441 +0x1c4 fp=0xffff4f3fcd60 sp=0xffff4f3fccd0 pc=0x437fd4
runtime.gcDrain(0x40000dc168, 0x2)
GOROOT/src/runtime/mgcmark.go:1242 +0x1d4 fp=0xffff4f3fcdd0 sp=0xffff4f3fcd60 pc=0x437774
runtime.gcDrainMarkWorkerDedicated(...)
GOROOT/src/runtime/mgcmark.go:1124
runtime.gcBgMarkWorker.func2()
GOROOT/src/runtime/mgc.go:1402 +0x154 fp=0xffff4f3fce20 sp=0xffff4f3fcdd0 pc=0x433a34
runtime.systemstack(0x0)
src/runtime/asm_arm64.s:243 +0x6c fp=0xffff4f3fce30 sp=0xffff4f3fce20 pc=0x48c3fc
goroutine 38 gp=0x4000a80a80 m=19 mp=0x400c210008 [GC worker (active)]:
runtime.systemstack_switch()
src/runtime/asm_arm64.s:200 +0x8 fp=0x4000a88730 sp=0x4000a88720 pc=0x48c378
runtime.gcBgMarkWorker()
GOROOT/src/runtime/mgc.go:1370 +0x204 fp=0x4000a887d0 sp=0x4000a88730 pc=0x433614
runtime.goexit({})
src/runtime/asm_arm64.s:1222 +0x4 fp=0x4000a887d0 sp=0x4000a887d0 pc=0x48e8a4
created by runtime.gcBgMarkStartWorkers in goroutine 1
GOROOT/src/runtime/mgc.go:1234 +0x28
What's a good next step here? Should this (retroactively) block the beta?
i don't think so, but i can ask around.
Provisionally assigning to Storage, on the hypothesis that this could be a bug with unsafe memory usage and they would be best equipped to track it down further. Thank you!
I have been trying to repro on an arm AWS node (same machine type as the failed test) with no luck so far. Whatever this is, it must be extremely rare. I filed https://github.com/cockroachdb/cockroach/issues/134312 to upgrade Go to 1.22.8 which has a fix which may in principle be relevant.
Makes sense to me. Thank you very much, Radu!
Still no luck reproducing. I am removing the release-blocker label since probably the only course of action here is to upgrade Go (and that issue is marked as a blocker).
Go was upgraded which hopefully will address this. I was unable to reproduce the crash; not much more we can do here.
ccl/backupccl.TestRestoreDatabaseVersusTable failed with artifacts on release-24.3 @ c077ebf6e98bcd579481b93c83f14184ab94f2e6:
Help
See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)
/cc @cockroachdb/disaster-recovery
This test on roachdash | Improve this report!
Jira issue: CRDB-43874