cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.96k stars 3.79k forks source link

kv/kvserver: TestFlowControlAdmissionPostSplitMergeV2 failed #131735

Open cockroach-teamcity opened 1 day ago

cockroach-teamcity commented 1 day ago

kv/kvserver.TestFlowControlAdmissionPostSplitMergeV2 failed on master @ b38c8dd1faa68970c3310c1e82da71d593a67052:

=== RUN   TestFlowControlAdmissionPostSplitMergeV2
    test_log_scope.go:170: test logs captured to: outputs.zip/logTestFlowControlAdmissionPostSplitMergeV24052089275
    test_log_scope.go:81: use -show-logs to present logs inline
    flow_control_integration_test.go:2590: -- test log scope end --
test logs left over in: outputs.zip/logTestFlowControlAdmissionPostSplitMergeV24052089275
--- FAIL: TestFlowControlAdmissionPostSplitMergeV2 (232.22s)
=== RUN   TestFlowControlAdmissionPostSplitMergeV2/v2_enabled_when_leader_level=2
    flow_control_integration_test.go:4620: condition failed to evaluate within 3m45s: handle for 71 not found
    --- FAIL: TestFlowControlAdmissionPostSplitMergeV2/v2_enabled_when_leader_level=2 (228.21s)

Parameters:

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

Jira issue: CRDB-42673

Epic CRDB-37515

kvoli commented 20 hours ago

Handle isn't found post-split for the RHS (most likely), which should be there for any leader replica w/ v2 enabled:

=== RUN   TestFlowControlAdmissionPostSplitMergeV2/v2_enabled_when_leader_level=2
    flow_control_integration_test.go:4620: condition failed to evaluate within 3m45s: handle for 71 not found

https://github.com/cockroachdb/cockroach/blob/86d0b7c8772a0bff195b6009494491ca56efe260/pkg/kv/kvserver/flow_control_integration_test.go#L2528-L2528

Seems like it merged back into r70 before the test (did it) expected it to:

I241002 08:06:45.991012 2592095 kv/kvserver/replica_command.go:843 ⋮ [T1,Vsystem,n1,s1,r70/1:‹{/Table/Max-\xfa\x00}›,*kvpb.AdminMergeRequest] 261  initiating a merge of r71:‹{\xfa\x00-/Max}› [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=6, sticky=9223372036.854775807,2147483647] into this range (manual)
I241002 08:06:46.004518 2617357 kv/kvserver/store_remove_replica.go:152 ⋮ [T1,Vsystem,n1,s1,r70/1:‹{/Table/Max-\xfa\x00}›,raft] 262  removing replica r71/1
I241002 08:06:46.004737 2874515 kv/kvserver/store_remove_replica.go:152 ⋮ [T1,Vsystem,n3,s3,r70/3:‹{/Table/Max-\xfa\x00}›,raft] 263  removing replica r71/3
I241002 08:06:46.005156 2826445 kv/kvserver/store_remove_replica.go:152 ⋮ [T1,Vsystem,n2,s2,r70/2:‹{/Table/Max-\xfa\x00}›,raft] 264  removing replica r71/2

Whats even odder and perhaps a cause, is that r70 is split off from r69 twice?, even after r71 is split from r70 and merged back together. The test issues a single split request (not including the scratch range splitting off) here:

https://github.com/kvoli/cockroach/blob/3c3b0f4a3821203f00896a8a924824bd87a51d46/pkg/kv/kvserver/flow_control_integration_test.go#L2527-L2527

Here's the full split/merge logs for r69(progenitor)/r70(scratch)/r71(scratch rhs split off, then merged back)

Details

``` kvservertest-kv-distribution.log:I241002 08:06:45.460935 2592095 13@kv/kvserver/replica_command.go:482 ⋮ [T1,Vsystem,n1,s1,r69/1:‹/{Table/67-Max}›,*kvpb.AdminSplitRequest] 69 initiating a split of this range at key /Table/Max [r70] (manual) kvservertest-kv-distribution.log:I241002 08:06:45.488035 2592095 13@kv/kvserver/replica_command.go:2494 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›,*kvpb.AdminChangeReplicasRequest] 71 change replicas (add [(n2,s2):2LEARNER] remove []): existing descriptor r70:‹/{Table/Max-Max}› [(n1,s1):1, next=2, gen=1, sticky=9223372036.854775807,2147483647] kvservertest-kv-distribution.log:I241002 08:06:45.495488 2592095 13@kv/kvserver/replica_raft.go:399 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›] 72 proposing SIMPLE(l2) [(n2,s2):2LEARNER]: after=[(n1,s1):1 (n2,s2):2LEARNER] next=3 kvservertest-kv-distribution.log:I241002 08:06:45.500432 2592095 13@kv/kvserver/replica_command.go:2494 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›,*kvpb.AdminChangeReplicasRequest] 73 change replicas (add [(n3,s3):3LEARNER] remove []): existing descriptor r70:‹/{Table/Max-Max}› [(n1,s1):1, (n2,s2):2LEARNER, next=3, gen=2, sticky=9223372036.854775807,2147483647] kvservertest-kv-distribution.log:I241002 08:06:45.508355 2592095 13@kv/kvserver/replica_raft.go:399 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›] 74 proposing SIMPLE(l3) [(n3,s3):3LEARNER]: after=[(n1,s1):1 (n2,s2):2LEARNER (n3,s3):3LEARNER] next=4 kvservertest-kv-distribution.log:I241002 08:06:45.513723 2929771 13@kv/kvserver/store_snapshot.go:2096 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›] 75 streamed snapshot 206cea39 at applied index 15 to (n2,s2):2LEARNER with 754 B in 0.00s @ 1.6 MiB/s: kvs=10 rangeKVs=0 sharedSSTs=0, externalSSTs=0, rate-limit: 32 MiB/s, queued: 0.00s kvservertest-kv-distribution.log:I241002 08:06:45.514561 2929948 13@kv/kvserver/replica_raftstorage.go:508 ⋮ [T1,Vsystem,n2,s2,r70/2:{-}] 76 applying snapshot 206cea39 from (n1,s1):1 at applied index 15 kvservertest-kv-distribution.log:I241002 08:06:45.521135 2930901 13@kv/kvserver/store_snapshot.go:2096 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›] 77 streamed snapshot 84c6ccda at applied index 15 to (n3,s3):3LEARNER with 754 B in 0.00s @ 1.7 MiB/s: kvs=10 rangeKVs=0 sharedSSTs=0, externalSSTs=0, rate-limit: 32 MiB/s, queued: 0.00s kvservertest-kv-distribution.log:I241002 08:06:45.522103 2930958 13@kv/kvserver/replica_raftstorage.go:508 ⋮ [T1,Vsystem,n3,s3,r70/3:{-}] 78 applying snapshot 84c6ccda from (n1,s1):1 at applied index 15 kvservertest-kv-distribution.log:I241002 08:06:45.527851 2592095 13@kv/kvserver/replica_command.go:2494 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›,*kvpb.AdminChangeReplicasRequest] 79 change replicas (add [(n2,s2):2VOTER_INCOMING (n3,s3):3VOTER_INCOMING] remove []): existing descriptor r70:‹/{Table/Max-Max}› [(n1,s1):1, (n2,s2):2LEARNER, (n3,s3):3LEARNER, next=4, gen=3, sticky=9223372036.854775807,2147483647] kvservertest-kv-distribution.log:I241002 08:06:45.537805 2592095 13@kv/kvserver/replica_raft.go:399 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›] 80 proposing ENTER_JOINT(v2 v3) [(n2,s2):2VOTER_INCOMING (n3,s3):3VOTER_INCOMING]: after=[(n1,s1):1 (n2,s2):2VOTER_INCOMING (n3,s3):3VOTER_INCOMING] next=4 kvservertest-kv-distribution.log:I241002 08:06:45.543452 2592095 13@kv/kvserver/replica_command.go:2494 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›,*kvpb.AdminChangeReplicasRequest] 81 change replicas (add [] remove []): existing descriptor r70:‹/{Table/Max-Max}› [(n1,s1):1, (n2,s2):2VOTER_INCOMING, (n3,s3):3VOTER_INCOMING, next=4, gen=4, sticky=9223372036.854775807,2147483647] kvservertest-kv-distribution.log:I241002 08:06:45.551069 2592095 13@kv/kvserver/replica_raft.go:399 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›] 82 proposing LEAVE_JOINT: after=[(n1,s1):1 (n2,s2):2 (n3,s3):3] next=4 kvservertest-kv-distribution.log:I241002 08:06:45.716996 2592095 13@kv/kvserver/replica_command.go:482 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›,*kvpb.AdminSplitRequest] 83 initiating a split of this range at key ‹"\xfa\x00"› [r71] (manual) kvservertest-kv-distribution.log:I241002 08:06:49.440810 2992298 13@kv/kvserver/replica_command.go:482 ⋮ [T1,Vsystem,n1,s1,r69/1:‹/{Table/67-Max}›,*kvpb.AdminSplitRequest] 153 initiating a split of this range at key /Table/Max [r70] (manual) kvservertest-kv-distribution.log:I241002 08:06:49.467696 2992298 13@kv/kvserver/replica_command.go:2494 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›,*kvpb.AdminChangeReplicasRequest] 155 change replicas (add [(n2,s2):2LEARNER] remove []): existing descriptor r70:‹/{Table/Max-Max}› [(n1,s1):1, next=2, gen=1, sticky=9223372036.854775807,2147483647] kvservertest-kv-distribution.log:I241002 08:06:49.474477 2992298 13@kv/kvserver/replica_raft.go:399 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›] 156 proposing SIMPLE(l2) [(n2,s2):2LEARNER]: after=[(n1,s1):1 (n2,s2):2LEARNER] next=3 kvservertest-kv-distribution.log:I241002 08:06:49.479309 2992298 13@kv/kvserver/replica_command.go:2494 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›,*kvpb.AdminChangeReplicasRequest] 157 change replicas (add [(n3,s3):3LEARNER] remove []): existing descriptor r70:‹/{Table/Max-Max}› [(n1,s1):1, (n2,s2):2LEARNER, next=3, gen=2, sticky=9223372036.854775807,2147483647] kvservertest-kv-distribution.log:I241002 08:06:49.487271 2992298 13@kv/kvserver/replica_raft.go:399 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›] 158 proposing SIMPLE(l3) [(n3,s3):3LEARNER]: after=[(n1,s1):1 (n2,s2):2LEARNER (n3,s3):3LEARNER] next=4 kvservertest-kv-distribution.log:I241002 08:06:49.492701 3328841 13@kv/kvserver/store_snapshot.go:2096 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›] 159 streamed snapshot 66af1d5f at applied index 15 to (n2,s2):2LEARNER with 754 B in 0.00s @ 1.3 MiB/s: kvs=10 rangeKVs=0 sharedSSTs=0, externalSSTs=0, rate-limit: 32 MiB/s, queued: 0.00s kvservertest-kv-distribution.log:I241002 08:06:49.493414 3328972 13@kv/kvserver/replica_raftstorage.go:508 ⋮ [T1,Vsystem,n2,s2,r70/2:{-}] 160 applying snapshot 66af1d5f from (n1,s1):1 at applied index 15 kvservertest-kv-distribution.log:I241002 08:06:49.500123 3329921 13@kv/kvserver/store_snapshot.go:2096 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›] 161 streamed snapshot b5767391 at applied index 15 to (n3,s3):3LEARNER with 754 B in 0.00s @ 1.7 MiB/s: kvs=10 rangeKVs=0 sharedSSTs=0, externalSSTs=0, rate-limit: 32 MiB/s, queued: 0.00s kvservertest-kv-distribution.log:I241002 08:06:49.500917 3330028 13@kv/kvserver/replica_raftstorage.go:508 ⋮ [T1,Vsystem,n3,s3,r70/3:{-}] 162 applying snapshot b5767391 from (n1,s1):1 at applied index 15 kvservertest-kv-distribution.log:I241002 08:06:49.506762 2992298 13@kv/kvserver/replica_command.go:2494 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›,*kvpb.AdminChangeReplicasRequest] 163 change replicas (add [(n2,s2):2VOTER_INCOMING (n3,s3):3VOTER_INCOMING] remove []): existing descriptor r70:‹/{Table/Max-Max}› [(n1,s1):1, (n2,s2):2LEARNER, (n3,s3):3LEARNER, next=4, gen=3, sticky=9223372036.854775807,2147483647] kvservertest-kv-distribution.log:I241002 08:06:49.516104 2992298 13@kv/kvserver/replica_raft.go:399 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›] 164 proposing ENTER_JOINT(v2 v3) [(n2,s2):2VOTER_INCOMING (n3,s3):3VOTER_INCOMING]: after=[(n1,s1):1 (n2,s2):2VOTER_INCOMING (n3,s3):3VOTER_INCOMING] next=4 kvservertest-kv-distribution.log:I241002 08:06:49.522487 2992298 13@kv/kvserver/replica_command.go:2494 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›,*kvpb.AdminChangeReplicasRequest] 165 change replicas (add [] remove []): existing descriptor r70:‹/{Table/Max-Max}› [(n1,s1):1, (n2,s2):2VOTER_INCOMING, (n3,s3):3VOTER_INCOMING, next=4, gen=4, sticky=9223372036.854775807,2147483647] kvservertest-kv-distribution.log:I241002 08:06:49.530438 2992298 13@kv/kvserver/replica_raft.go:399 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›] 166 proposing LEAVE_JOINT: after=[(n1,s1):1 (n2,s2):2 (n3,s3):3] next=4 kvservertest-kv-distribution.log:I241002 08:06:49.679120 2992298 13@kv/kvserver/replica_command.go:482 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›,*kvpb.AdminSplitRequest] 167 initiating a split of this range at key ‹"\xfa\x00"› [r71] (manual) ```

So two things that seem impossible to occur via the test directives:

  1. r70 was split twice
  2. r71 and r70 were merged back together, without a corresponding AdminMerge by the test (we were stuck waiting for connected streams on r71 before the merge occurs, there's only a single merge)

The range also becomes unavailable at the end of the test:

Details

``` kvservertest.log:W241002 08:09:54.326849 7852368 kv/kvclient/kvcoord/dist_sender.go:2746 ⋮ [T1,Vsystem,n1,consistencyChecker,s1,r71/1:‹{\xfa\x00-/Max}›] 1170 slow replica RPC: have been waiting 60.00s (0 attempts) for RPC Put [/Local/Range‹"\xfa\x00"›/‹QueueLastProcessed›/‹"consistencyChecker"›] to replica (n1,s1):1; resp: ‹(err: result is ambiguous: after 60.00s of attempting command: context deadline exceeded)› kvservertest.log:W241002 08:09:54.327082 7852368 kv/kvclient/kvcoord/dist_sender.go:2190 ⋮ [T1,Vsystem,n1,consistencyChecker,s1,r71/1:‹{\xfa\x00-/Max}›] 1171 slow range RPC: have been waiting 60.00s (1 attempts) for RPC Put [/Local/Range‹"\xfa\x00"›/‹QueueLastProcessed›/‹"consistencyChecker"›] to r71:‹{\xfa\x00-/Max}› [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=6, sticky=9223372036.854775807,2147483647]; resp: ‹(err: result is ambiguous: after 60.00s of attempting command: context deadline exceeded)› kvservertest.log:W241002 08:09:54.327156 7852368 kv/kvclient/kvcoord/dist_sender.go:2200 ⋮ [T1,Vsystem,n1,consistencyChecker,s1,r71/1:‹{\xfa\x00-/Max}›] 1172 slow RPC response: slow RPC finished after 60.00s (1 attempts) kvservertest.log:E241002 08:09:54.327955 7852368 kv/kvserver/consistency_queue.go:207 ⋮ [T1,Vsystem,n1,consistencyChecker,s1,r71/1:‹{\xfa\x00-/Max}›] 1173 aborted in DistSender: result is ambiguous: context deadline exceeded kvservertest.log:E241002 08:09:54.328088 7852368 kv/kvserver/queue.go:1198 ⋮ [T1,Vsystem,n1,consistencyChecker,s1,r71/1:‹{\xfa\x00-/Max}›] 1174 operation "‹consistencyChecker› queue process replica 71" timed out after 1m0.003s (given timeout 1m0s): aborted in DistSender: result is ambiguous: context deadline exceeded kvservertest.log:W241002 08:09:55.652188 3017632 kv/kvserver/replica_raft.go:1596 ⋮ [T1,Vsystem,n1,s1,r71/1:‹{\xfa\x00-/Max}›,raft] 1176 have been waiting 61.50s for slow proposal Put [/Local/Range‹"\xfa\x00"›/‹QueueLastProcessed›/‹"consistencyChecker"›] kvservertest.log:E241002 08:09:55.653190 10429706 kv/kvserver/replica_circuit_breaker.go:163 ⋮ [T1,Vsystem,n1,s1,r71/1:‹{\xfa\x00-/Max}›] 1177 breaker: tripped with error: replica unavailable: (n1,s1):1 unable to serve request to r71:‹{\xfa\x00-/Max}› [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=6, sticky=9223372036.854775807,2147483647]: closed timestamp: 1727856531.223208790,0 (2024-10-02 08:08:51); raft status: {"id":"1","term":5,"vote":"0","commit":10,"lead":"0","raftState":"StatePreCandidate","applied":10,"progress":{},"leadtransferee":"0"}: have been waiting 61.50s for slow proposal Put [/Local/Range‹"\xfa\x00"›/‹QueueLastProcessed›/‹"consistencyChecker"›] kvservertest.log:E241002 08:09:56.413620 10448576 kv/kvserver/consistency_queue.go:207 ⋮ [T1,Vsystem,n1,consistencyChecker,s1,r71/1:‹{\xfa\x00-/Max}›] 1179 replica unavailable: (n1,s1):1 unable to serve request to r71:‹{\xfa\x00-/Max}› [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=6, sticky=9223372036.854775807,2147483647]: closed timestamp: 1727856531.223208790,0 (2024-10-02 08:08:51); raft status: {"id":"1","term":5,"vote":"0","commit":10,"lead":"0","raftState":"StatePreCandidate","applied":10,"progress":{},"leadtransferee":"0"}: have been waiting 61.50s for slow proposal Put [/Local/Range‹"\xfa\x00"›/‹QueueLastProcessed›/‹"consistencyChecker"›] kvservertest.log:E241002 08:09:56.413784 10448576 kv/kvserver/queue.go:1198 ⋮ [T1,Vsystem,n1,consistencyChecker,s1,r71/1:‹{\xfa\x00-/Max}›] 1180 replica unavailable: (n1,s1):1 unable to serve request to r71:‹{\xfa\x00-/Max}› [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=6, sticky=9223372036.854775807,2147483647]: closed timestamp: 1727856531.223208790,0 (2024-10-02 08:08:51); raft status: {"id":"1","term":5,"vote":"0","commit":10,"lead":"0","raftState":"StatePreCandidate","applied":10,"progress":{},"leadtransferee":"0"}: have been waiting 61.50s for slow proposal Put [/Local/Range‹"\xfa\x00"›/‹QueueLastProcessed›/‹"consistencyChecker"›] kvservertest.log:E241002 08:10:34.955738 10429778 kv/kvserver/replica_circuit_breaker.go:163 ⋮ [T1,Vsystem,n1,s1,r71/1:‹{\xfa\x00-/Max}›] 1251 breaker: now tripped with error: replica unavailable: (n1,s1):1 unable to serve request to r71:‹{\xfa\x00-/Max}› [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=6, sticky=9223372036.854775807,2147483647]: closed timestamp: 1727856531.223208790,0 (2024-10-02 08:08:51); raft status: {"id":"1","term":5,"vote":"0","commit":10,"lead":"0","raftState":"StatePreCandidate","applied":10,"progress":{},"leadtransferee":"0"}: result is ambiguous: server shutdown (previously: replica unavailable: (n1,s1):1 unable to serve request to r71:‹{\xfa\x00-/Max}› [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=6, sticky=9223372036.854775807,2147483647]: closed timestamp: 1727856531.223208790,0 (2024-10-02 08:08:51); raft status: {"id":"1","term":5,"vote":"0","commit":10,"lead":"0","raftState":"StatePreCandidate","applied":10,"progress":{},"leadtransferee":"0"}: have been waiting 61.50s for slow proposal Put [/Local/Range‹"\xfa\x00"›/‹QueueLastProcessed›/‹"consistencyChecker"›]) ```

Which is a good sign things have gone wrong following the unexpected double split and merge.

kvoli commented 19 hours ago

No luck over 1k stress runs w/ deadlock on b38c8dd1faa68970c3310c1e82da71d593a67052. I'm going to bump this up, how this happened doesn't really add up.

git checkout b38c8dd1faa68970c3310c1e82da71d593a67052
dev test pkg/kv/kvserver -v --vmodule='flow_control_replica_integration=2,admission=1,kvflowcontroller=2,kvflowdispatch=1,replica_flow_control=1,work_queue=1,tracker=1,kvflowhandle=1,replica_raft=1,replica_proposal_buf=1,raft_transport=2,kvadmission=1,work_queue=1,replica_flow_control=1,client_raft_helpers_test=1,range_controller=2,token_counter=2,token_tracker=2,processor=2' -f TestFlowControlAdmissionPostSplitMergeV2/v2_enabled_when_leader_level=2 --stress --deadlock

...

INFO: Found 1 test target...
Target //pkg/kv/kvserver:kvserver_test up-to-date:
  _bazel/bin/pkg/kv/kvserver/kvserver_test_/kvserver_test
INFO: Elapsed time: 1161.501s, Critical Path: 154.76s
INFO: 3330 processes: 6 internal, 3324 darwin-sandbox.
INFO: Build completed successfully, 3330 total actions
//pkg/kv/kvserver:kvserver_test                                          PASSED in 8.3s
  Stats over 1000 runs: max = 8.3s, min = 4.0s, avg = 4.8s, dev = 0.4s

Executed 1 out of 1 test: 1 test passes.
kvoli commented 19 hours ago

Got a repro, in this case the second split doesn't occur, neither does the merge. I think these weren't the underlying cause[^1]. The RHS r71 also ends up unavailable.

E241002 17:34:51.588932 7974708 kv/kvserver/replica_circuit_breaker.go:163 ⋮ [T1,Vsystem,n1,s1,r71/1:‹{\xfa\x00-/Max}›] 1207  breaker: now tripped with error: replica unavailable: (n1,s1):1 unable to serve request to r71:‹{\xfa\x00-/Max}› [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=6, sticky=9223372036.854775807,2147483647]: closed timestamp: 1727890337.901967000,0 (2024-10-02 17:32:17); raft status: {"id":"1","term":5,"vote":"0","commit":10,"lead":"0","raftState":"StatePreCandidate","applied":10,"progress":{},"leadtransferee":"0"}: result is ambiguous: server shutdown (previously: replica unavailable: (n1,s1):1 unable to serve request to r71:‹{\xfa\x00-/Max}› [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=6, sticky=9223372036.854775807,2147483647]: closed timestamp: 1727890337.901967000,0 (2024-10-02 17:32:17); raft status: {"id":"1","term":5,"vote":"0","commit":10,"lead":"0","raftState":"StatePreCandidate","applied":10,"progress":{},"leadtransferee":"0"}: have been waiting 63.00s for slow proposal Put [/Local/Range‹"\xfa\x00"›/‹QueueLastProcessed›/‹"consistencyChecker"›])

There are no corresponding log lines for r71 aside from the split and then later it being unavailable.

[^1]: My guess is that the split doesn't get applied fully, something inside the binary merges them back together (no logs here though, aside from the MergeRequest, quite suspect), then the testhelper SplitOrFatal re-issues the split, or maybe Raft does? either way, this was a symptom, not the cause.

kvoli commented 19 hours ago

Another hit, this time w/ every raft message logged for r70,r71:

This time, 1 of the replicas doesn't ever respond to a MsgApp, causing only 2/3 connected streams expected:

17:56:00.004659 424466 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n2] 437   [raft] r71 Raft message ‹1->2 MsgPreVote Term:6 Log:5/10›
17:56:00.004807 424485 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n3] 439   [raft] r71 Raft message ‹1->3 MsgPreVote Term:6 Log:5/10›
17:56:00.010009 444050 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n1] 441   [raft] r71 Raft message ‹2->1 MsgPreVoteResp Term:6 Log:0/0›
17:56:00.010441 427451 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n1] 442   [raft] r71 Raft message ‹3->1 MsgPreVoteResp Term:6 Log:0/0›
17:56:00.010562 424485 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n3] 443   [raft] r71 Raft message ‹1->3 MsgVote Term:6 Log:5/10›
17:56:00.010740 424466 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n2] 444   [raft] r71 Raft message ‹1->2 MsgVote Term:6 Log:5/10›
17:56:00.011002 427451 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n1] 445   [raft] r71 Raft message ‹3->1 MsgVoteResp Term:6 Log:0/0›
17:56:00.011586 424466 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n2] 449   [raft] r71 Raft message ‹1->2 MsgApp Term:6 Log:5/10 Commit:10 Entries:[6/11 EntryNormal <omitted>]› # this never gets ack'd by n2, it goes into StateProbe
17:56:00.011771 424485 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n3] 450   [raft] r71 Raft message ‹1->3 MsgApp Term:6 Log:5/10 Commit:10 Entries:[6/11 EntryNormal <omitted>]›
17:56:00.011805 444050 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n1] 451   [raft] r71 Raft message ‹2->1 MsgVoteResp Term:6 Log:0/0›
17:56:00.014837 427451 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n1] 452   [raft] r71 Raft message ‹3->1 MsgAppResp Term:6 Log:0/11 Commit:10›
# then no more replies from 2, note it has an outstanding MsgApp
Full

```python I241002 17:55:59.891274 19111 13@kv/kvserver/replica_raft.go:399 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›] 82 proposing LEAVE_JOINT: after=[(n1,s1):1 (n2,s2):2 (n3,s3):3] next=4 I241002 17:55:59.990289 19111 13@kv/kvserver/replica_command.go:482 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›,*kvpb.AdminSplitRequest] 83 initiating a split of this range at key ‹"\xfa\x00"› [r71] (manual) I241002 17:56:00.000318 19111 13@kv/kvserver/batcheval/cmd_end_transaction.go:1224 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›] 84 falling back to accurate stats computation because there are no pre-split LHS stats (or they're empty) # post-split 17:56:00.004659 424466 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n2] 437 [raft] r71 Raft message ‹1->2 MsgPreVote Term:6 Log:5/10› 17:56:00.004807 424485 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n3] 439 [raft] r71 Raft message ‹1->3 MsgPreVote Term:6 Log:5/10› 17:56:00.010009 444050 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n1] 441 [raft] r71 Raft message ‹2->1 MsgPreVoteResp Term:6 Log:0/0› 17:56:00.010441 427451 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n1] 442 [raft] r71 Raft message ‹3->1 MsgPreVoteResp Term:6 Log:0/0› 17:56:00.010562 424485 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n3] 443 [raft] r71 Raft message ‹1->3 MsgVote Term:6 Log:5/10› 17:56:00.010740 424466 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n2] 444 [raft] r71 Raft message ‹1->2 MsgVote Term:6 Log:5/10› 17:56:00.011002 427451 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n1] 445 [raft] r71 Raft message ‹3->1 MsgVoteResp Term:6 Log:0/0› 17:56:00.011100 48643 kv/kvserver/kvflowcontrol/rac2/range_controller.go:434 ⋮ [T1,Vsystem,n1,s1,r71/1:‹{\xfa\x00-/Max}›,raft] 446 r71 creating range controller 17:56:00.011145 48643 kv/kvserver/kvflowcontrol/rac2/range_controller.go:1507 ⋮ [T1,Vsystem,n1,s1,r71/1:‹{\xfa\x00-/Max}›,raft] 447 creating send stream t‹1›/s‹1› for replica (n1,s1):1 17:56:00.011368 48643 kv/kvserver/kvflowcontrol/rac2/range_controller.go:1471 ⋮ [T1,Vsystem,n1,s1,r71/1:‹{\xfa\x00-/Max}›,raft] 448 r71:(n1,s1):1 stream t‹1›/s‹1› admit term:6, admitted:[LowPri:11,NormalPri:11,AboveNormalPri:11,HighPri:11] 17:56:00.011586 424466 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n2] 449 [raft] r71 Raft message ‹1->2 MsgApp Term:6 Log:5/10 Commit:10 Entries:[6/11 EntryNormal ]› 17:56:00.011771 424485 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n3] 450 [raft] r71 Raft message ‹1->3 MsgApp Term:6 Log:5/10 Commit:10 Entries:[6/11 EntryNormal ]› 17:56:00.011805 444050 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n1] 451 [raft] r71 Raft message ‹2->1 MsgVoteResp Term:6 Log:0/0› 17:56:00.014837 427451 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n1] 452 [raft] r71 Raft message ‹3->1 MsgAppResp Term:6 Log:0/11 Commit:10› 17:56:00.014989 48601 kv/kvserver/kvflowcontrol/rac2/range_controller.go:1507 ⋮ [T1,Vsystem,n1,s1,r71/1:‹{\xfa\x00-/Max}›,raft] 453 creating send stream t‹1›/s‹3› for replica (n3,s3):3 17:56:00.015339 424485 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n3] 454 [raft] r71 Raft message ‹1->3 MsgApp Term:6 Log:6/11 Commit:11› 17:56:00.016242 427451 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n1] 455 [raft] r71 Raft message ‹3->1 MsgAppResp Term:6 Log:0/11 Commit:11› 17:56:00.016319 48613 kv/kvserver/kvflowcontrol/rac2/range_controller.go:1471 ⋮ [-] 456 r71:(n3,s3):3 stream t‹1›/s‹3› admit term:6, admitted:[LowPri:11,NormalPri:11,AboveNormalPri:11,HighPri:11] 17:57:19.751787 48583 kv/kvserver/kvflowcontrol/rac2/range_controller.go:1471 ⋮ [T1,Vsystem,n1,s1,r71/1:‹{\xfa\x00-/Max}›,raft] 891 r71:(n1,s1):1 stream t‹1›/s‹1› admit term:6, admitted:[LowPri:12,NormalPri:12,AboveNormalPri:12,HighPri:12] 17:57:19.751913 424485 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n3] 892 [raft] r71 Raft message ‹1->3 MsgApp Term:6 Log:6/11 Commit:11 Entries:[6/12 EntryNormal ]› 17:57:19.752256 427451 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n1] 893 [raft] r71 Raft message ‹3->1 MsgAppResp Term:6 Log:0/12 Commit:11› 17:57:19.752339 48613 kv/kvserver/kvflowcontrol/rac2/range_controller.go:1471 ⋮ [-] 894 r71:(n3,s3):3 stream t‹1›/s‹3› admit term:6, admitted:[LowPri:12,NormalPri:12,AboveNormalPri:12,HighPri:12] 17:57:19.752602 424485 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n3] 895 [raft] r71 Raft message ‹1->3 MsgApp Term:6 Log:6/12 Commit:12› 17:57:19.753266 427451 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n1] 896 [raft] r71 Raft message ‹3->1 MsgAppResp Term:6 Log:0/12 Commit:12› 17:57:19.753495 48613 kv/kvserver/kvflowcontrol/rac2/range_controller.go:1471 ⋮ [-] 897 r71:(n3,s3):3 stream t‹1›/s‹3› admit term:6, admitted:[LowPri:12,NormalPri:12,AboveNormalPri:12,HighPri:12] 17:57:19.754347 424485 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n3] 899 [raft] r71 Raft message ‹1->3 MsgApp Term:6 Log:6/12 Commit:12 Entries:[6/13 EntryNormal ]› 17:57:19.754316 48631 kv/kvserver/kvflowcontrol/rac2/range_controller.go:1471 ⋮ [T1,Vsystem,n1,s1,r71/1:‹{\xfa\x00-/Max}›,raft] 898 r71:(n1,s1):1 stream t‹1›/s‹1› admit term:6, admitted:[LowPri:13,NormalPri:13,AboveNormalPri:13,HighPri:13] 17:57:19.754817 427451 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n1] 900 [raft] r71 Raft message ‹3->1 MsgAppResp Term:6 Log:0/13 Commit:12› 17:57:19.754893 48565 kv/kvserver/kvflowcontrol/rac2/range_controller.go:1471 ⋮ [-] 901 r71:(n3,s3):3 stream t‹1›/s‹3› admit term:6, admitted:[LowPri:13,NormalPri:13,AboveNormalPri:13,HighPri:13] 17:57:19.755152 424485 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n3] 902 [raft] r71 Raft message ‹1->3 MsgApp Term:6 Log:6/13 Commit:13› 17:57:19.755758 427451 kv/kvserver_test/client_raft_helpers_test.go:102 ⋮ [T1,Vsystem,n1] 903 [raft] r71 Raft message ‹3->1 MsgAppResp Term:6 Log:0/13 Commit:13› 17:57:19.755887 48637 kv/kvserver/kvflowcontrol/rac2/range_controller.go:1471 ⋮ [-] 904 r71:(n3,s3):3 stream t‹1›/s‹3› admit term:6, admitted:[LowPri:13,NormalPri:13,AboveNormalPri:13,HighPri:13] ```

I think if I timeout the test explicitly, it should dump some stacks, which will be useful, there may be a deadlock.

kvoli commented 18 hours ago

The stacks didn't show up anything that interesting, here they are:

dump.log

I'm going to take a step back from this failure and see if others can glean something useful.

kvoli commented 17 hours ago

P-0 because this is a new bug and we should resolve it prior to the branch cut for 24.3.

kvoli commented 17 hours ago

@pav-kv has offered to give this a pass tomorrow, many thanks 🤠.

Here's the diff I used for logging:

diff

```diff diff --git a/pkg/kv/kvserver/flow_control_integration_test.go b/pkg/kv/kvserver/flow_control_integration_test.go index b281b67a04b..0d110e23b80 100644 --- a/pkg/kv/kvserver/flow_control_integration_test.go +++ b/pkg/kv/kvserver/flow_control_integration_test.go @@ -10,6 +10,7 @@ import ( "fmt" "math" "math/rand" + "runtime" "strconv" "strings" "sync/atomic" @@ -37,6 +38,7 @@ import ( "github.com/cockroachdb/cockroach/pkg/util/leaktest" "github.com/cockroachdb/cockroach/pkg/util/log" "github.com/cockroachdb/cockroach/pkg/util/randutil" + "github.com/cockroachdb/cockroach/pkg/util/timeutil" "github.com/cockroachdb/errors" "github.com/dustin/go-humanize" "github.com/olekukonko/tablewriter" @@ -2507,6 +2509,11 @@ func TestFlowControlAdmissionPostSplitMergeV2(t *testing.T) { desc, err := tc.LookupRange(k) require.NoError(t, err) + // Enable verbose logging for the two scratch ranges. We're assuming there + // will be no other splits, so post-split the RHS scratch range will have + // the next range id. + h.enableVerboseRaftMsgLoggingForRange(desc.RangeID) + h.enableVerboseRaftMsgLoggingForRange(desc.RangeID + 1) h.waitForConnectedStreams(ctx, desc.RangeID, 3, 0 /* serverIdx */) @@ -4617,18 +4624,37 @@ func (h *flowControlTestHelper) waitForConnectedStreams( lvl ...kvflowcontrol.V2EnabledWhenLeaderLevel, ) { level := h.resolveLevelArgs(lvl...) + start := timeutil.Now() + shouldPanic := false testutils.SucceedsSoon(h.t, func() error { state, found := h.getInspectHandlesForLevel(serverIdx, level).LookupInspect(rangeID) if !found { + if timeutil.Now().Sub(start) > 20*time.Second { + log.Infof(ctx, "been longer than 20s will panic state=+v", state) + shouldPanic = true + return nil + } return fmt.Errorf("handle for %s not found", rangeID) } require.True(h.t, found) if len(state.ConnectedStreams) != expConnectedStreams { + if timeutil.Now().Sub(start) > 20*time.Second { + log.Infof(ctx, "been longer than 20s will panic state=+v", state) + shouldPanic = true + return nil + } return fmt.Errorf("expected %d connected streams, got %d", expConnectedStreams, len(state.ConnectedStreams)) } return nil }) + if shouldPanic { + log.Infof(ctx, "been longer than 20s panicing") + buf := make([]byte, 1<<20) + stacklen := runtime.Stack(buf, true) + log.Infof(ctx, "=== received SIGQUIT ===\n*** goroutine dump...\n%s\n*** end\n", buf[:stacklen]) + require.FailNow(h.t, "timed out waiting for connected streams") + } } func (h *flowControlTestHelper) waitForTotalTrackedTokens( ```

It reproduces on both the OP failing commit and more recent commit on master 794f4f11aba06e85a487b3e153a0aa4d3ee4c70f.

Also fails for both encoding variants e.g.

TestFlowControlAdmissionPostSplitMergeV2/v2_enabled_when_leader_level=1
TestFlowControlAdmissionPostSplitMergeV2/v2_enabled_when_leader_level=2

I was using this invocation to repro:

./dev test pkg/kv/kvserver -v --vmodule='raft=1,admission=1,replica_flow_control=1,work_queue=1,replica_raft=1,replica_proposal_buf=1,raft_transport=2,kvadmission=1,work_queue=1,replica_flow_control=1,client_raft_helpers_test=1,range_controller=2,token_counter=2,token_tracker=2,processor=2' -f TestFlowControlAdmissionPostSplitMergeV2/v2_enabled_when_leader_level=2 --stress --deadlock --count=10000