cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.77k stars 3.76k forks source link

roachtest: replicagc-changed-peers/restart=true failed #124204

Closed cockroach-teamcity closed 3 weeks ago

cockroach-teamcity commented 3 months ago

roachtest.replicagc-changed-peers/restart=true failed with artifacts on master @ 6300c3c3367ad46ac48bf24915cf0d73cae446a0:

(replicagc.go:101).runReplicaGCChangedPeers: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/replicagc-changed-peers/restart=true/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

/cc @cockroachdb/replication

This test on roachdash | Improve this report!

Jira issue: CRDB-38762

pav-kv commented 3 months ago
  | (1) Node 2. Command with error:
  |   | ```
  |   | ./cockroach node decommission --wait=none 1 2 3 --port={pgport:2} --certs-dir=certs
  |   | ```
  |   | <truncated> ... se  active  false   allocation errors   27
  |   | 3   false   116 false   active  false   allocation errors   13
  |   |
  |   | stderr:ranges blocking decommission detected
  |   | n1 has 13 replicas blocked with error: "1 matching stores are currently throttled: [[n4,s4]: canAcceptSnapshotLocked: cannot add placeholder, have an existing placeholder range=168 [/Table/106/1/273961545649151664-/Table/106/1/456602576081919500) (placeholder) (n2,s2):2]"
  |   | n2 has 27 replicas blocked with error: "1 matching stores are currently throttled: [[n4,s4]: canAcceptSnapshotLocked: cannot add placeholder, have an existing placeholder range=168 [/Table/106/1/273961545649151664-/Table/106/1/456602576081919500) (placeholder) (n2,s2):2]"
  |   | n3 has 13 replicas blocked with error: "1 matching stores are currently throttled: [[n4,s4]: canAcceptSnapshotLocked: cannot add placeholder, have an existing placeholder range=168 [/Table/106/1/273961545649151664-/Table/106/1/456602576081919500) (placeholder) (n2,s2):2]"
  |   |
  |   | ERROR: Cannot decommission nodes.
  |   | Failed running "node decommission"
kvoli commented 3 months ago

We should ignore throttled stores for decommissioning pre-checks. Marking as a bug.

github-actions[bot] commented 1 month ago

We have marked this test failure issue as stale because it has been inactive for 1 month. If this failure is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 5 days to keep the test failure queue tidy.