faust-streaming / faust

Python Stream Processing. A Faust fork
https://faust-streaming.github.io/faust/
Other
1.65k stars 183 forks source link

Rebalance can leave some workers with no partitions #93

Open bobh66 opened 3 years ago

bobh66 commented 3 years ago

Checklist

Steps to reproduce

When the number of partitions for an agent topic is not evenly divisible by the number of workers, the PartitionAssignor can leave one or more workers with no partitions assigned. This wastes CPU and memory resources, as well as causing other workers to carry a heavier load and potentially reduces throughput.

The initial deployment will distribute the partitions across all workers. For example, a topic with 100 partitions will be spread across 40 workers with 20 workers having three partitions and 20 workers having 2 partitions. The CopartitionedAssignor will calculate the (maximum) capacity for each worker to be 3 partitions, using the formula ceil(num_partitions / num_workers).

Now if one worker gets rebooted or leaves the group for any reason, a rebalance is triggered and the 2 or 3 partitions for that worker get moved to other workers that have 2 partitions, leaving 22 or 23 workers with 3 partitions and 17 or 18 workers with 2 partitions (100 partitions across 39 workers).

When the rebooted worker recovers and rejoins the group, it will probably not receive any partitions because there are no "extra" partitions on any of the workers. The maximum capacity is still 3, no worker has more than 3 partitions, so no partitions are "available" for assignment. Rebooting the worker has no impact for the same reason.

The worker with no partitions will leave the consumer group after 5 minutes as the aiokafka Fetcher has been idle due to no assignment, which means that future rebalances of the group will NOT include this consumer/worker, and it will be idle forever, or until the group is redeployed.

Expected behavior

Partitions should be assigned to all available workers, as balanced as possible.

Actual behavior

Workers can receive no partition assignment and leave the group.

Full traceback

Paste the full traceback (if there is any)

Versions

jkgenser commented 3 years ago

@bobh66 : Just wondering, why did you revert your changes to fix the partition assignment problems raised in this issue?

bobh66 commented 3 years ago

Hi @jkgenser - I needed to do more testing with the partition assignment changes and I didn't want to hold up releasing the other two fixes which are also important. The existing partitioner logic will intentionally leave a consumer without any active partitions, which is also validated by existing tests, so in order to "fix" that I need to change the way the tests work. I'm reluctant to make too big a change in this area given how much of an impact it can have. I'm going to restrict the partitioner changes to only force a "balanced" configuration when there are no tables in use and the table_standby_replicas option is set to 0. That will limit the initial impact of the change so we can see how it works, and then decide whether to expand the implementation or not. Since the partitioner will default to promoting a standby partition to active, it is still possible to have empty assignments even with my changes, so it's better to leave that code alone for now. Hopefully this makes sense.

taybin commented 3 years ago

Has this issue been resolved?

xaralis commented 3 years ago

Has this issue been resolved?

@taybin We're currently experiencing this so I guess this isn't resolved yet.

bobh66 commented 3 years ago

At this point there will always be scenarios where this can happen - see this comment for more details on exactly what was fixed - https://github.com/faust-streaming/faust/pull/97#issuecomment-776774231

Basically you need to "enable" the fix by setting table_standby_replicas to 0.