Closed rytaft closed 1 year ago
cc @postamar @ajwerner not sure if this actually has anything to do with schema stuff, but it appears that the seg fault happened inside table_desc.go
.
It almost definitely has something to do with schema stuff.
This bug is really scary, to be honest. I'm unable to repro but I'll find a way somehow. Just out of curiosity @rytaft was this a one-off or were you able to trigger this segfault multiple times?
I only saw this once -- sorry! Thanks for investigating!
I was able to repro it, but just once! Unfortunately my terminal crashed and I didn't copy/paste the output.
Here's what I did after running cockroach demo --nodes 9 --global
ALTER DATABASE movr PRIMARY REGION "us-east1";
ALTER DATABASE movr add REGION "us-west1";
ALTER DATABASE movr add REGION "europe-west1";
ALTER DATABASE movr SURVIVE REGION FAILURE;
alter table rides set locality regional by row;
SHOW JOBS
edit: not quite the same segfault, but it was something else scary related to runtime corruption!
Hi @postamar, please add branch-* labels to identify which branch(es) this release-blocker affects.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.
It sort of feels plausibly related to https://github.com/golang/go/issues/43873 -- but far from certainly.
I created https://github.com/cockroachdb/cockroach/pull/62391 to make repro'ing this easier.
I crashed demo today by running ./cockroach demo movr --nodes 9
followed by the new refresh built in. Here's a gist https://gist.github.com/awoods187/e9289c706ddc529034368fd3a2a25323
@awoods187 this confirms this is a memory corruption bug somewhere and has nothing to do with @postamar and schema code. Nevertheless, it's very scary.
@knz one common thread (pun intended) in common here is the syscall in the readline code, though my sense is that that is to be expected. In general when we see memory corruption errors I wonder about cgo things. I don't want to rope you into anything. I appreciate your expertise. If you've got any pointers (pun less intended) on how we might try to better understand this situation, I'd be keen to learn.
I'm betting it's somewhat overlapping with #57885
That is, macOS big sur gets libedit confused at some point, which causes the signal handler to not be restored properly upon a call to readline.
I suspect one of the following approaches will trigger the problem:
Also if confirmed to be specific to macOS I don't think this is a GA blocker.
I recently upgraded to Big Sur and tried today to test the above by running demo, kill with control c, run demo again and it is definitely hung. In activity monitor, I also see a huge spike in CPU% for cockroach. Closing the shell at this point doesn't end cockroach and I need to pkill cockroach
in another window.
andrewwoods@Andrew-W-MacBook-Pro:~/go/src/github.com/cockroachdb/cockroach$ ./cockroach demo movr --nodes 9
#
# Welcome to the CockroachDB demo database!
#
# You are connected to a temporary, in-memory CockroachDB cluster of 9 nodes.
#
# This demo session will attempt to enable enterprise features
# by acquiring a temporary license from Cockroach Labs in the background.
# To disable this behavior, set the environment variable
# COCKROACH_SKIP_ENABLING_DIAGNOSTIC_REPORTING=true.
#
# Beginning initialization of the movr dataset, please wait...
Andy that particular scenario surprises me. Are you sure the cpu spike is not just the movr
dataset initializing normally? Does demo
remain hung forever?
It was hung for at least ten minutes before I killed it. It doesn't seem deterministic as I tried it again and it didn't hang.
thanks for checking.
I wonder if we need to disclaim support for macOS entirely. This is a class of issues that is caused by Apple not playing nice with the open source ecosystem.
The alternative is to deliver a simpler client for macOS users, which does not use advanced line editing for SQL commands.
I hit this again today I believe:
demo@127.0.0.1:26257/movr> *
* ERROR: [n1,summaries] a panic has occurred!
* runtime error: invalid memory address or nil pointer dereference
* (1) attached stack trace
* -- stack trace:
* | runtime.gopanic
* | /usr/local/Cellar/go/1.15.4/libexec/src/runtime/panic.go:969
* | runtime.panicmem
* | /usr/local/Cellar/go/1.15.4/libexec/src/runtime/panic.go:212
* | runtime.sigpanic
* | /usr/local/Cellar/go/1.15.4/libexec/src/runtime/signal_unix.go:742
* | github.com/cockroachdb/cockroach/pkg/util/metric.(*Counter).Count
* | <autogenerated>:1
* | github.com/cockroachdb/cockroach/pkg/server/status.extractValue
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/status/recorder.go:563
* | github.com/cockroachdb/cockroach/pkg/server/status.eachRecordableValue.func1
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/status/recorder.go:595
* | github.com/cockroachdb/cockroach/pkg/util/metric.(*Registry).Each.func1
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/metric/registry.go:153
* | github.com/cockroachdb/cockroach/pkg/util/metric.(*Counter).Inspect
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/metric/metric.go:356
* | github.com/cockroachdb/cockroach/pkg/util/metric.(*Registry).Each
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/metric/registry.go:152
* | github.com/cockroachdb/cockroach/pkg/server/status.eachRecordableValue
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/status/recorder.go:574
* | github.com/cockroachdb/cockroach/pkg/server/status.(*MetricsRecorder).GenerateNodeStatus
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/status/recorder.go:455
* | github.com/cockroachdb/cockroach/pkg/server.(*Node).writeNodeStatus.func1
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:753
* | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTask
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:313
* | github.com/cockroachdb/cockroach/pkg/server.(*Node).writeNodeStatus
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:752
* | github.com/cockroachdb/cockroach/pkg/server.(*Node).startWriteNodeStatus.func1
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:736
* | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:351
* | runtime.goexit
* | /usr/local/Cellar/go/1.15.4/libexec/src/runtime/asm_amd64.s:1374
* Wraps: (2) runtime error: invalid memory address or nil pointer dereference
* Error types: (1) *withstack.withStack (2) runtime.errorString
*
*
* ERROR: [n1,summaries] a panic has occurred!
* runtime error: invalid memory address or nil pointer dereference
* (1) attached stack trace
* -- stack trace:
* | runtime.gopanic
* | /usr/local/Cellar/go/1.15.4/libexec/src/runtime/panic.go:969
* | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:233
* | runtime.gopanic
* | /usr/local/Cellar/go/1.15.4/libexec/src/runtime/panic.go:969
* | runtime.panicmem
* | /usr/local/Cellar/go/1.15.4/libexec/src/runtime/panic.go:212
* | runtime.sigpanic
* | /usr/local/Cellar/go/1.15.4/libexec/src/runtime/signal_unix.go:742
* | github.com/cockroachdb/cockroach/pkg/util/metric.(*Counter).Count
* | <autogenerated>:1
* | github.com/cockroachdb/cockroach/pkg/server/status.extractValue
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/status/recorder.go:563
* | github.com/cockroachdb/cockroach/pkg/server/status.eachRecordableValue.func1
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/status/recorder.go:595
* | github.com/cockroachdb/cockroach/pkg/util/metric.(*Registry).Each.func1
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/metric/registry.go:153
* | github.com/cockroachdb/cockroach/pkg/util/metric.(*Counter).Inspect
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/metric/metric.go:356
* | github.com/cockroachdb/cockroach/pkg/util/metric.(*Registry).Each
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/metric/registry.go:152
* | github.com/cockroachdb/cockroach/pkg/server/status.eachRecordableValue
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/status/recorder.go:574
* | github.com/cockroachdb/cockroach/pkg/server/status.(*MetricsRecorder).GenerateNodeStatus
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/status/recorder.go:455
* | github.com/cockroachdb/cockroach/pkg/server.(*Node).writeNodeStatus.func1
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:753
* | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTask
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:313
* | github.com/cockroachdb/cockroach/pkg/server.(*Node).writeNodeStatus
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:752
* | github.com/cockroachdb/cockroach/pkg/server.(*Node).startWriteNodeStatus.func1
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:736
* | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:351
* | runtime.goexit
* | /usr/local/Cellar/go/1.15.4/libexec/src/runtime/asm_amd64.s:1374
* Wraps: (2) runtime error: invalid memory address or nil pointer dereference
* Error types: (1) *withstack.withStack (2) runtime.errorString
*
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x5101caf]
goroutine 1014 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc000d16d00, 0x9528720, 0xc00464d710)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:233 +0x126
panic(0x801ed00, 0xbe88070)
/usr/local/Cellar/go/1.15.4/libexec/src/runtime/panic.go:969 +0x1b9
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc000d16d00, 0x9528720, 0xc00464d710)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:233 +0x126
panic(0x801ed00, 0xbe88070)
/usr/local/Cellar/go/1.15.4/libexec/src/runtime/panic.go:969 +0x1b9
github.com/cockroachdb/cockroach/pkg/util/metric.(*Counter).Count(0xc0000dbe60, 0x8655f80)
<autogenerated>:1 +0x2f
github.com/cockroachdb/cockroach/pkg/server/status.extractValue(0x8655f80, 0xc0000dbe60, 0x0, 0x0, 0xc015b75170)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/status/recorder.go:563 +0x19a
github.com/cockroachdb/cockroach/pkg/server/status.eachRecordableValue.func1(0xc0009d2810, 0x26, 0x8655f80, 0xc0000dbe60)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/status/recorder.go:595 +0x191
github.com/cockroachdb/cockroach/pkg/util/metric.(*Registry).Each.func1(0x8655f80, 0xc0000dbe60)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/metric/registry.go:153 +0x6c
github.com/cockroachdb/cockroach/pkg/util/metric.(*Counter).Inspect(0xc0000dbe60, 0xc04df290a0)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/metric/metric.go:356 +0x3c
github.com/cockroachdb/cockroach/pkg/util/metric.(*Registry).Each(0xc0010fc440, 0xc04df068a0)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/metric/registry.go:152 +0x125
github.com/cockroachdb/cockroach/pkg/server/status.eachRecordableValue(0xc0010fc440, 0xc04df06890)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/status/recorder.go:574 +0x65
github.com/cockroachdb/cockroach/pkg/server/status.(*MetricsRecorder).GenerateNodeStatus(0xc000103500, 0x9528720, 0xc00464d710, 0x0)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/status/recorder.go:455 +0x62c
github.com/cockroachdb/cockroach/pkg/server.(*Node).writeNodeStatus.func1(0x9528720, 0xc00464d710)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:753 +0x8b
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTask(0xc000d16d00, 0x9528720, 0xc00464d710, 0x8731684, 0x1a, 0xc015b75e20, 0x0, 0x0)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:313 +0xb2
github.com/cockroachdb/cockroach/pkg/server.(*Node).writeNodeStatus(0xc00165c580, 0x9528720, 0xc00464d710, 0x4a817c800, 0x1, 0x0, 0x0)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:752 +0xbd
github.com/cockroachdb/cockroach/pkg/server.(*Node).startWriteNodeStatus.func1(0x9528720, 0xc00464d710)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:736 +0x16d
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc000d16d00, 0x9528720, 0xc00464d710, 0x0, 0xc0011bb7c0)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:351 +0xb9
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:346 +0xfc
@awoods187 no this is a different issue. Can you file separately? You're looking at a bug in our implementation of metrics.
Given the stack trace, pleas assign to tobias for triage.
Done--thanks @knz https://github.com/cockroachdb/cockroach/issues/63218
I removed the "GA-blocker" since this is macOS specific.
Also @erikgrinaker has found out that there is a new change in macOS Big Sur which is poossibly triggering incompatibilities inside the Go runtime (see #63719). This may be the underlying cause here as well.
I got a panic that might or might not be related, when testing this multi-region local simulation tutorial: https://www.cockroachlabs.com/docs/v21.1/demo-low-latency-multi-region-deployment.html
root@127.0.0.1:26257/movr> *
* ERROR: [n9,s9] a panic has occurred!
* runtime error: invalid memory address or nil pointer dereference
* (1) attached stack trace
* -- stack trace:
* | runtime.gopanic
* | /usr/local/go/src/runtime/panic.go:969
* | runtime.panicmem
* | /usr/local/go/src/runtime/panic.go:212
* | runtime.sigpanic
* | /usr/local/go/src/runtime/signal_unix.go:742
* | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).tick
* | /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:951
* | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processTick
* | /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:563
* | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).worker
* | /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/scheduler.go:279
* | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1
* | /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:351
* | runtime.goexit
* | /usr/local/go/src/runtime/asm_amd64.s:1374
* Wraps: (2) runtime error: invalid memory address or nil pointer dereference
* Error types: (1) *withstack.withStack (2) runtime.errorString
*
*
* ERROR: [n9,s9] Queued as error 5f5e8ae8ab344fe28313d817ccb9b7f9
*
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x0]
goroutine 2432 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc00289c400, 0x9494180, 0xc00563a420)
/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:233 +0x126
panic(0x7fa7880, 0xbd94420)
/usr/local/go/src/runtime/panic.go:969 +0x1b9
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).tick(0xc00590b600, 0x9494180, 0xc008cb0180, 0xc006ed3b30, 0xc008cb0100, 0x0, 0x0)
/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:951 +0x31c
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processTick(0xc00565e000, 0x9494180, 0xc00563a420, 0x1, 0x1)
/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:563 +0x14a
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).worker(0xc003aac1e0, 0x9494180, 0xc00563a420)
/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/scheduler.go:279 +0x309
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc00289c400, 0x9494180, 0xc00563a420, 0x0, 0xc003adca00)
/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:351 +0xb9
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:346 +0xfc
@jseldess what's the output of version
for your executable?
~/cockroach-v21.1.0-beta.4.darwin-10.9-amd64$ ./cockroach version
Build Tag: v21.1.0-beta.4
Build Time: 2021/04/19 15:35:45
Distribution: CCL
Platform: darwin amd64 (x86_64-apple-darwin19)
Go Version: go1.15.10
C Compiler: Clang 10.0.0
Build Commit ID: abc4eb5acda343e64415f0267ccd94805d7f2760
Build Type: release
Ok so jesse what you observed is a different issue from what becca observed at the top. Can you also file a different issue? You can ping me on it, it's going to go to the KV project.
Cheers
filed #64203 on your behalf.
Thank you, @knz!
I think it's possible the upgrade to the latest go version has alleviated this issue. to folk who've experienced this above, have you seen this happen any time recently with the v21.2 codebase?
This may have happened again (on 22.1.0-alpha.3): https://github.com/cockroachdb/cockroach/issues/77840
We fixed this with the new editor in 23.1
Here is the output of my terminal, using the latest master:
Jira issue: CRDB-6374