Closed cockroach-teamcity closed 1 month ago
Looks overloaded based off this slack response (internal) here
Error: executing ALTER TABLE kv SPLIT AT VALUES (-6290595461644587904): pq: internal error while retrieving user account memberships: operation "get-user-session" timed out after 11.653s (given timeout 10s): txn exec: context deadline exceeded
And:
I240726 06:39:57.913222 264 kv/kvserver/store_raft.go:696 ⋮ [T1,Vsystem,n1,s1,r112645/1:‹/Table/106/1/-6137{979…-856…}›,raft] 9029 raft ready handling: 0.75s [append=0.00s, apply=0.75s, , other=0.00s], wrote [apply=1.4 KiB (1)], state_assertions=1; node might be overloaded
Interestingly, we see:
I240726 06:36:12.867248 251 kv/kvserver/replica_proposal_buf.go:670 ⋮ [T1,Vsystem,n1,s1,r26728/1:‹/Table/106/1/-4709{468…-284…}›,raft] 5611 campaigning because Raft leader (id=2) not live in node liveness map
...
I240726 06:37:40.208126 250 kv/kvserver/replica_proposal_buf.go:670 ⋮ [T1,Vsystem,n1,s1,r158504/1:‹/Table/106/1/2069{7485…-8100…}›,raft] 6691 campaigning because Raft leader (id=3) not live in node liveness map
I'm going to chalk this up to overload, we could probably do something to reduce metrics CPU when updating the replication guages:
We have marked this test failure issue as stale because it has been inactive for 1 month. If this failure is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 5 days to keep the test failure queue tidy.
roachtest.kv/splits/nodes=3/quiesce=true/lease=epoch failed with artifacts on master @ 7fb362dd5aa6e85d65c4c89f208c5bed51ab9692:
Parameters:
ROACHTEST_arch=amd64
ROACHTEST_cloud=azure
ROACHTEST_coverageBuild=false
ROACHTEST_cpu=4
ROACHTEST_encrypted=false
ROACHTEST_metamorphicBuild=false
ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
Grafana is not yet available for azure clusters
/cc @cockroachdb/kv-triageThis test on roachdash | Improve this report!
Jira issue: CRDB-40568