Open kvoli opened 1 year ago
Hi @kvoli, please add branch-* labels to identify which branch(es) this release-blocker affects.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.
Removing GA blocker. See https://github.com/cockroachdb/cockroach/pull/98893#issuecomment-1474497768
bestStoreToMinimizeDelta
selects a store to move load to from the given existing store. The candidate store list it receives is filtered to only be valid, equally diverse and constraint matching replacements.https://github.com/cockroachdb/cockroach/blob/0495342e81d473901278993f6162b1d78b685bc1/pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go#L1187-L1187
The domain is constructed to be candidates+existing and is used for calculating the mean load value. The mean load value then determines if the max load thresholds. The max load threshold is used to possibly return early if the existing store isn't above that threshold. The result is, if the mean is incorrectly using a different domain than the candidates+existing,
bestStoreToMinimizeLoad
may erroneously returnexistingNotOverfull
and no rebalance target.The domain store list is not constructed using the domain currently, instead it uses the
storeDescMap
which doesn't guarantee to only include the domain stores (lease transfers don't). This is causing the above issue.https://github.com/cockroachdb/cockroach/blob/0495342e81d473901278993f6162b1d78b685bc1/pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go#L1218-L1223
Jira issue: CRDB-25561