cockroachdb / cockroach

CockroachDB - the open source, cloud-native distributed SQL database.
https://www.cockroachlabs.com
Other
29.51k stars 3.7k forks source link

kvserver: merges don't respect static splits (like liveness range) #96365

Open cockroach-teamcity opened 1 year ago

cockroach-teamcity commented 1 year ago

This issue was autofiled by Sentry. It represents a crash or reported error on a live cluster with telemetry enabled.

Sentry link: https://sentry.io/organizations/cockroach-labs/issues/3910802942/?referrer=webhooks_plugin

Panic message:

panic.go:884: panic: × (1) attached stack trace -- stack trace: runtime.gopanic GOROOT/src/runtime/panic.go:884 github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*rangeIDQueue).SetPriorityID github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:127 github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).SetPriorityID github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:243 github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).setDescLockedRaftMuLocked github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_init.go:371 github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).loadRaftMuLockedReplicaMuLocked github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_init.go:208 github.com/cockroachdb/cockroach/pkg/kv/kvserver.prepareRightReplicaForSplit github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_split.go:247 github.com/cockroachdb/cockroach/pkg/kv/kvserver.splitPostApply github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_split.go:163 github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleSplitResult github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_application_result.go:251 github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicaStateMachine).handleNonTrivialReplicatedEvalResult github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_application_state_machine.go:1357 github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicaStateMachine).ApplySideEffects github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_application_state_machine.go:1232 github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.mapCheckedCmdIter github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/cmd.go:206 github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.(*Task).applyOneBatch github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/task.go:290 github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.(*Task).ApplyCommittedEntries github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/task.go:246 github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReadyRaftMuLocked github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1045 github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReady github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:664 github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processReady github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:641 github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).worker github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:308 github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2 github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:489 runtime.goexit GOROOT/src/runtime/asm_amd64.s:1594 Wraps: (2) panic: ×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×

Error types: (1) withstack.withStack (2) errutil.leafError -- report composition: errutil.leafError: panic: × panic.go:884: withstack.withStack (top exception)

Stacktrace (expand for inline code snippets): GOROOT/src/runtime/panic.go#L883-L885 in runtime.gopanic https://github.com/cockroachdb/cockroach/blob/cf1e7e6bc6ef9e55510b9ed13bd068bb3894cd92/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go#L126-L128 in pkg/kv/kvserver.(*rangeIDQueue).SetPriorityID https://github.com/cockroachdb/cockroach/blob/cf1e7e6bc6ef9e55510b9ed13bd068bb3894cd92/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go#L242-L244 in pkg/kv/kvserver.(*raftScheduler).SetPriorityID https://github.com/cockroachdb/cockroach/blob/cf1e7e6bc6ef9e55510b9ed13bd068bb3894cd92/pkg/kv/kvserver/pkg/kv/kvserver/replica_init.go#L370-L372 in pkg/kv/kvserver.(*Replica).setDescLockedRaftMuLocked https://github.com/cockroachdb/cockroach/blob/cf1e7e6bc6ef9e55510b9ed13bd068bb3894cd92/pkg/kv/kvserver/pkg/kv/kvserver/replica_init.go#L207-L209 in pkg/kv/kvserver.(*Replica).loadRaftMuLockedReplicaMuLocked https://github.com/cockroachdb/cockroach/blob/cf1e7e6bc6ef9e55510b9ed13bd068bb3894cd92/pkg/kv/kvserver/pkg/kv/kvserver/store_split.go#L246-L248 in pkg/kv/kvserver.prepareRightReplicaForSplit https://github.com/cockroachdb/cockroach/blob/cf1e7e6bc6ef9e55510b9ed13bd068bb3894cd92/pkg/kv/kvserver/pkg/kv/kvserver/store_split.go#L162-L164 in pkg/kv/kvserver.splitPostApply https://github.com/cockroachdb/cockroach/blob/cf1e7e6bc6ef9e55510b9ed13bd068bb3894cd92/pkg/kv/kvserver/pkg/kv/kvserver/replica_application_result.go#L250-L252 in pkg/kv/kvserver.(*Replica).handleSplitResult https://github.com/cockroachdb/cockroach/blob/cf1e7e6bc6ef9e55510b9ed13bd068bb3894cd92/pkg/kv/kvserver/pkg/kv/kvserver/replica_application_state_machine.go#L1356-L1358 in pkg/kv/kvserver.(*replicaStateMachine).handleNonTrivialReplicatedEvalResult https://github.com/cockroachdb/cockroach/blob/cf1e7e6bc6ef9e55510b9ed13bd068bb3894cd92/pkg/kv/kvserver/pkg/kv/kvserver/replica_application_state_machine.go#L1231-L1233 in pkg/kv/kvserver.(*replicaStateMachine).ApplySideEffects https://github.com/cockroachdb/cockroach/blob/cf1e7e6bc6ef9e55510b9ed13bd068bb3894cd92/pkg/kv/kvserver/apply/cmd.go#L205-L207 in pkg/kv/kvserver/apply.mapCheckedCmdIter https://github.com/cockroachdb/cockroach/blob/cf1e7e6bc6ef9e55510b9ed13bd068bb3894cd92/pkg/kv/kvserver/apply/task.go#L289-L291 in pkg/kv/kvserver/apply.(*Task).applyOneBatch https://github.com/cockroachdb/cockroach/blob/cf1e7e6bc6ef9e55510b9ed13bd068bb3894cd92/pkg/kv/kvserver/apply/task.go#L245-L247 in pkg/kv/kvserver/apply.(*Task).ApplyCommittedEntries https://github.com/cockroachdb/cockroach/blob/cf1e7e6bc6ef9e55510b9ed13bd068bb3894cd92/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go#L1044-L1046 in pkg/kv/kvserver.(*Replica).handleRaftReadyRaftMuLocked https://github.com/cockroachdb/cockroach/blob/cf1e7e6bc6ef9e55510b9ed13bd068bb3894cd92/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go#L663-L665 in pkg/kv/kvserver.(*Replica).handleRaftReady https://github.com/cockroachdb/cockroach/blob/cf1e7e6bc6ef9e55510b9ed13bd068bb3894cd92/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go#L640-L642 in pkg/kv/kvserver.(*Store).processReady https://github.com/cockroachdb/cockroach/blob/cf1e7e6bc6ef9e55510b9ed13bd068bb3894cd92/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go#L307-L309 in pkg/kv/kvserver.(*raftScheduler).worker https://github.com/cockroachdb/cockroach/blob/cf1e7e6bc6ef9e55510b9ed13bd068bb3894cd92/pkg/util/stop/stopper.go#L488-L490 in pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2 GOROOT/src/runtime/asm_amd64.s#L1593-L1595 in runtime.goexit
GOROOT/src/runtime/panic.go in runtime.gopanic at line 884
pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go in pkg/kv/kvserver.(*rangeIDQueue).SetPriorityID at line 127
pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go in pkg/kv/kvserver.(*raftScheduler).SetPriorityID at line 243
pkg/kv/kvserver/pkg/kv/kvserver/replica_init.go in pkg/kv/kvserver.(*Replica).setDescLockedRaftMuLocked at line 371
pkg/kv/kvserver/pkg/kv/kvserver/replica_init.go in pkg/kv/kvserver.(*Replica).loadRaftMuLockedReplicaMuLocked at line 208
pkg/kv/kvserver/pkg/kv/kvserver/store_split.go in pkg/kv/kvserver.prepareRightReplicaForSplit at line 247
pkg/kv/kvserver/pkg/kv/kvserver/store_split.go in pkg/kv/kvserver.splitPostApply at line 163
pkg/kv/kvserver/pkg/kv/kvserver/replica_application_result.go in pkg/kv/kvserver.(*Replica).handleSplitResult at line 251
pkg/kv/kvserver/pkg/kv/kvserver/replica_application_state_machine.go in pkg/kv/kvserver.(*replicaStateMachine).handleNonTrivialReplicatedEvalResult at line 1357
pkg/kv/kvserver/pkg/kv/kvserver/replica_application_state_machine.go in pkg/kv/kvserver.(*replicaStateMachine).ApplySideEffects at line 1232
pkg/kv/kvserver/apply/cmd.go in pkg/kv/kvserver/apply.mapCheckedCmdIter at line 206
pkg/kv/kvserver/apply/task.go in pkg/kv/kvserver/apply.(*Task).applyOneBatch at line 290
pkg/kv/kvserver/apply/task.go in pkg/kv/kvserver/apply.(*Task).ApplyCommittedEntries at line 246
pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go in pkg/kv/kvserver.(*Replica).handleRaftReadyRaftMuLocked at line 1045
pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go in pkg/kv/kvserver.(*Replica).handleRaftReady at line 664
pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go in pkg/kv/kvserver.(*Store).processReady at line 641
pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go in pkg/kv/kvserver.(*raftScheduler).worker at line 308
pkg/util/stop/stopper.go in pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2 at line 489
GOROOT/src/runtime/asm_amd64.s in runtime.goexit at line 1594
Tag Value
Cockroach Release v22.2.1
Cockroach SHA: cf1e7e6bc6ef9e55510b9ed13bd068bb3894cd92
Platform linux amd64
Distribution CCL
Environment v22.2.1
Command server
Go Version ``
# of CPUs
# of Goroutines

Jira issue: CRDB-24089

blathers-crl[bot] commented 1 year ago

cc @cockroachdb/replication

tbg commented 1 year ago

https://github.com/cockroachdb/cockroach/blob/0695b83ff06561ed575cd4d8579c17dc72e2814c/pkg/kv/kvserver/scheduler.go#L127-L130

Everything got redacted away, naturally, but the stack trace shows that we're hitting this when a replica splits.

There is only a single event in the report, from an official release binary, and it's a v32cpu machine in an at least three node cluster (since nodeID is 3). This looks like a legit deployment. I don't see any other events for the cluster, but I don't even find this very one via the search, so I don't think that's supporting evidence - instead it just reflects the fact that this report is nearing its three-month anniversary.

A precondition for calling into this code is having the node liveness as prefix:

https://github.com/cockroachdb/cockroach/blob/3d6fe9ebd5f1c32cc6a1e2f547b8a16e04a84d38/pkg/kv/kvserver/replica_init.go#L368-L372

So one possibility would be that we accidentally split the node liveness range. We're not supposed to split it, and there are checks against this:

https://github.com/cockroachdb/cockroach/blob/b86fd4f29d6f7150443755cc769efe40121fb172/pkg/keys/spans.go#L51-L55

used by

https://github.com/cockroachdb/cockroach/blob/985d57e9d5bd8e5d6cd0b71bc757da08e599c13c/pkg/storage/pebble_iterator.go#L776-L780

which is called in

https://github.com/cockroachdb/cockroach/blob/3d6fe9ebd5f1c32cc6a1e2f547b8a16e04a84d38/pkg/kv/kvserver/replica_command.go#L371-L373

and this method is involved in every split, which I verified via this on a test cluster:

    for i, k := range []roachpb.Key{
        keys.NodeLivenessPrefix.Next(),
        keys.NodeLivenessKey(123),
    } {
        _, _, err := tc.SplitRange(k)
        require.Errorf(t, err, "expected error for idx %d at key %s", i, k)
    }

Additionally, the liveness range is fully split out at cluster bootstrap

https://github.com/cockroachdb/cockroach/blob/0900ea9d61052db20c97711fa013b8920d3b6a2e/pkg/config/system.go#L455-L461

However, it looks like merges can affect the liveness range 😬

merge right neighbor:

    client_merge_test.go:161: before: r2:/System/NodeLiveness{-Max} [(n1,s1):1, next=2, gen=0]
    client_merge_test.go:165: adminmerge: <nil>
    client_merge_test.go:168: after: r2:/System/{NodeLiveness-tsd} [(n1,s1):1, next=2, gen=1]

getting merged by left neighbor:

    client_merge_test.go:160: before: r2:/System/NodeLiveness{-Max} [(n1,s1):1, next=2, gen=0]
    client_merge_test.go:164: adminmerge: <nil>
    client_merge_test.go:167: after: r1:/{Min-System/NodeLivenessMax} [(n1,s1):1, next=2, gen=1]
Code ```go for _, k := range []roachpb.Key{keys.NodeLivenessSpan.Key /*, keys.NodeLivenessSpan.Key.Prevish(3)*/} { desc, err := tc.LookupRange(keys.NodeLivenessSpan.Key) t.Logf("before: %v", desc) require.NoError(t, err) args := adminMergeArgs(k) _, pErr := kv.SendWrapped(ctx, store.TestSender(), args) t.Logf("adminmerge: %s", pErr) desc, err = tc.LookupRange(keys.NodeLivenessSpan.Key) require.NoError(t, err) t.Logf("after: %v", desc) } ```

This is invoking AdminMerge directly, but I don't see anything in the merge queue that would prevent this. Enqueuing r2 on a roachprod cluster I had laying around, I see that it won't merge:

kv/kvserver/merge_queue.go:295 [n3,s3,r2/5:/System/NodeLiveness{-Max}] skipping merge to avoid thrashing: merged range r0:/System/{NodeLiveness-tsd} [, next=0, gen=0] may split (estimated size: 571021)

because the queue realizes that upon the merge, a split would be required right away. If I played the interface golf correctly, the actual implementation starts here:

https://github.com/cockroachdb/cockroach/blob/dc80d3cce7f89896bd81c32ac8caf0be094cb961/pkg/spanconfig/spanconfigstore/store.go#L82-L85

and the meat is here (the caller will fatal on error):

https://github.com/cockroachdb/cockroach/blob/2675c7c0481bb33eb9f43f1c2b52c152fca3ae2d/pkg/spanconfig/spanconfigstore/span_store.go#L105-L217

What's interesting here is that this is populated asynchronously and starts out empty. So it seems conceivable that if the rangefeed that provides the spanconfig updates is delayed, we could somehow get into a state where the merge queue entertains merging the node liveness range.

It seems to be a little worse on master (which presumably matches 23.1 too), because shouldSplitRange returns false on errors as well:

https://github.com/cockroachdb/cockroach/blob/00ec8d330a18d77e3aaa2e6d02e79f2d03a01404/pkg/kv/kvserver/split_queue.go#L142-L146

The assumption in shouldSplitRange is clearly that "doing nothing is safe". But in the merge caller, saying that you don't need to split can lead to "un-splitting" (merging) the liveness range, which is ¡no bueno!

It's hard to say what exactly happened in the cluster - did an ill-advised direct AdminMerge get directed at liveness range or its left neighbor; was there some delayed spanconfig scenario which allowed the merge queue to split the liveness range; or something else entirely - what seems clear is that we should very directly protect static splits from being undone by merges.

andrewbaptist commented 1 year ago

Great analysis! However, I'm not clear how we split the range.

My understanding of what happened is something like:

Original

|----- Liveness ----- | ----- Next Range -----| 

Post merge (root cause unknown, but not safely prevented). LIkely due to either manual AdminMerge or a timing race in spanConfigStore.computeSplitKey which returned an err.

|----- Liveness ----------- Next Range -----| 

shouldSplit() == true for this merged range now.

Attempt split. Why is this split key chosen.

|----- L1 -|--L2--------- Next Range -----| 

At this point, both the ranges will be "priority for raft" and it will cause panic.

What I don't understand is how ComputeSplitKey can ever return a range within the liveness range. It looks at the staticSplits first and should split at the end of the liveness range.

The one way I can see this happening is if COCKROACH_DISABLE_SPAN_CONFIGS was set. In this case, it would use the noopKVSubscriber which returns a dummy key, but that also always returns false for NeedsSplit.

Regardless, I don't think it should be a bug that we merge the liveness range with its neighbor, it should be discouraged since it will re-split immediately. It does seem to be a bug that we are splitting it right afterward.

As a side note, this would not have crashed in 23.2 once #101023 is merged.

erikgrinaker commented 1 year ago

My guess is that liveness was merged into the meta range, and then split back out because of the static split points. At that point, the old liveness range was already registered as a priority range in the scheduler, and adding the new one will panic (we never unregister these priority ranges since the liveness range never changes its ID).

tbg commented 1 year ago

I have a prototype to disallow these splits in https://github.com/cockroachdb/cockroach/pull/101727. I think it works, but various tests need to be updated (since they do violate the static splits). This could be done with a testing knob. I'm not working on this any more to focus on more urgent tasks until my extended leave.