Closed andrewbaptist closed 1 month ago
Hi @andrewbaptist, please add branch-* labels to identify which branch(es) this C-bug affects.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.
This was due to an invalid zone constraint, closing.
Scatter is intended to move both leases and replicas, however replicas are not moved. This impacts how index backfills move replicas around and can result in uneven and inefficient systems.
Steps to reproduce:
After the fill is complete, look at the number of ranges on the kv database, it will likely be ~100. Run a scatter:
Pro tip - set the rebalance rate higher:
This will move ~100 replicas. (Side note should this have moved ~300 replicas or only one replica per range)? Add a zone constraint:
Wait for the data to finish moving (check the Data Distribution tab from Advanced Config) Run another scatter and notice that nothing moves and the command completes immediately.
Looking at the distribution log from one of the nodes, notice the following log for all the ranges (shortened for simplicity)
Specifically it doesn't scatter because all the replicas attempt to move the replica off of n12 and this fails due to the zone config constraint.
While this is a slightly contrived example, something very similar happens for index backfill if there are any zone configs in place that target certain nodes.
The expected behavior is that the scatter would either move ALL replicas for a range or continue until it finds at least one replica it can move.
Jira issue: CRDB-42285