apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.47k stars 1.31k forks source link

Team removal results in significant space unbalance #2592

Open ajbeamon opened 4 years ago

ajbeamon commented 4 years ago

The data distribution team removal procedure gets run when changing the machines present in a cluster (for example, by exclusion/inclusion, adding new machines, etc.). When it happens, it often seems to result in a significant imbalance in the number of bytes stored on different processes.

This is a problem because some of the processes end up storing significantly more than they had been previously (one example I saw was 25% more for the worst process), which may not be easily accommodated in fuller clusters.

This is eventually healed after the team removal is complete and rebalancing movement can correct the problem.

xumengpanda commented 4 years ago

The team collection that manage team removal procedure is designed to be unaware of the load on teams. This design is to prevent the teams distributions from being affected by users' skewed traffic and hurting the fault-tolerance.

The solution should make the DD smarter in moving data around instead of making team collection aware of such problem.