cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.05k stars 3.8k forks source link

kvserver: noisy "descriptor changed" error logging #108588

Open erikgrinaker opened 1 year ago

erikgrinaker commented 1 year ago

We frequently see lots of these errors in production clusters, in https://github.com/cockroachlabs/support/issues/2527 about 2500 in 12 hours:

unable to relocate range to [<NODES>]: while carrying out changes [<CHANGES>]: 
  change replicas of r<RANGE> failed: descriptor changed: [expected] ... != [actual] ...

These are typically entirely benign, when multiple actors (i.e. nodes/queues) are trying to execute replication conf changes on the same range simultaneously and have to retry (it's a failed cput), but it can cause undue concern with users.

We should either not log these, or only log them after a few retries have failed with the same error, and possibly downgrade the severity and/or soften the language. We have seen cases where these were caused by an actual bug though, and the conf changes never succeeded (#94834), so we probably don't want to remove the logging entirely.

Jira issue: CRDB-30534

nvanbenschoten commented 1 year ago

Related to https://github.com/cockroachdb/cockroach/issues/72546.