Open jayshrivastava opened 1 year ago
cc @cockroachdb/cdc
@jayshrivastava FYI: with https://github.com/cockroachdb/cockroach/pull/114710 the work on this issue might become obsolete, and we may just get rid of balanced range distribution altogether.
Linking this related issue: https://cockroachlabs.slack.com/archives/C0KB9Q03D/p1701101549923459
Reducing priority level to P-3. We already have a rebalancing strategy (balanced simple distribution) that addresses the overload concerns in the support issue, so we do not have an urgent need for additional strategies.
See https://github.com/cockroachlabs/support/issues/2679.
The problem:
In this scenario, there's a changefeed running with
execution_locality = foo
on a table which is configured to with a leaseholder preference in regionbar
. The table foo has 150k ranges. Regionbar
has ~20 nodes and regionfoo
has ~10.We observed the following problems:
foo
was assigned ~60k ranges to watch with the next highest being ~3k ranges. This imbalance makes the changefeed run slower. When the leaseholders are non-local, we don't know what heuristics distsql will use to assign work to change aggregators. We should investigate and consider making our own planning logic. We should consider usingchangefeed.balance_range_distribution.enable
always, even after initial scans.The solution:
We want to add "planning modes" where we can choose how to distribute work to nodes when we plan a changefeed. Namely, we want 3 modes:
Jira issue: CRDB-33248