Open Dandandan opened 1 year ago
Batches should be distributed more evenly over output channels.
Seems to be a load balancing issue. If you could count the number of batches already distributed to each channel and not completed then the classical The Power of Two Choices in Randomized Load Balancing algorithm could be evaluated.
@cristian-ilies-vasile yes, instead of round-robin repartitioning an improved scheme could be implemented based on number of buffered batches.
One good article describing this technique can be read here: Deterministic Aperture: A distributed, load balancing algorithm https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/daperture-load-balancer
Describe the bug
RoundRobin repartitioning currently does not distribute the input tasks evenly over the output channels, causing the work to be not distributed evenly.
To Reproduce
When loading the data in memory in the TPC-H benchmark, this can be seen in the number of batches in
MemoryExec
(which uses RoundRobin partitioning).MemoryExec: partitions=32, partition_sizes=[32, 32, 32, 32, 32, 32, 32, 32, 26, 26, 26, 25, 25, 25, 25, 25, 25, 25, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16], metrics=[]
It has a bias for the first output partitions/channels.
Expected behavior
Batches should be distributed more evenly over output channels.
Additional context
No response