cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.04k stars 3.79k forks source link

allocator: domain store list is constructed using all stores passed in #98870

Open kvoli opened 1 year ago

kvoli commented 1 year ago

bestStoreToMinimizeDelta selects a store to move load to from the given existing store. The candidate store list it receives is filtered to only be valid, equally diverse and constraint matching replacements.

https://github.com/cockroachdb/cockroach/blob/0495342e81d473901278993f6162b1d78b685bc1/pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go#L1187-L1187

The domain is constructed to be candidates+existing and is used for calculating the mean load value. The mean load value then determines if the max load thresholds. The max load threshold is used to possibly return early if the existing store isn't above that threshold. The result is, if the mean is incorrectly using a different domain than the candidates+existing, bestStoreToMinimizeLoad may erroneously return existingNotOverfull and no rebalance target.

The domain store list is not constructed using the domain currently, instead it uses the storeDescMap which doesn't guarantee to only include the domain stores (lease transfers don't). This is causing the above issue.

https://github.com/cockroachdb/cockroach/blob/0495342e81d473901278993f6162b1d78b685bc1/pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go#L1218-L1223

Jira issue: CRDB-25561

blathers-crl[bot] commented 1 year ago

Hi @kvoli, please add branch-* labels to identify which branch(es) this release-blocker affects.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

kvoli commented 1 year ago

Removing GA blocker. See https://github.com/cockroachdb/cockroach/pull/98893#issuecomment-1474497768