apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.26k stars 1.18k forks source link

Improve RoundRobin `RepartitionExec` #6043

Open Dandandan opened 1 year ago

Dandandan commented 1 year ago

Describe the bug

RoundRobin repartitioning currently does not distribute the input tasks evenly over the output channels, causing the work to be not distributed evenly.

To Reproduce

When loading the data in memory in the TPC-H benchmark, this can be seen in the number of batches in MemoryExec (which uses RoundRobin partitioning).

MemoryExec: partitions=32, partition_sizes=[32, 32, 32, 32, 32, 32, 32, 32, 26, 26, 26, 25, 25, 25, 25, 25, 25, 25, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16], metrics=[]

It has a bias for the first output partitions/channels.

Expected behavior

Batches should be distributed more evenly over output channels.

Additional context

No response

cristian-ilies-vasile commented 1 year ago

Batches should be distributed more evenly over output channels.

Seems to be a load balancing issue. If you could count the number of batches already distributed to each channel and not completed then the classical The Power of Two Choices in Randomized Load Balancing algorithm could be evaluated.

Dandandan commented 1 year ago

@cristian-ilies-vasile yes, instead of round-robin repartitioning an improved scheme could be implemented based on number of buffered batches.

cristian-ilies-vasile commented 1 year ago

One good article describing this technique can be read here: Deterministic Aperture: A distributed, load balancing algorithm https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/daperture-load-balancer