Closed wsry closed 2 years ago
About the load balance improvement issue, I implement a simple version. This PR mainly balances the load according to the worker storage, but not considering the worker throughput load. This change also abstracts the logic of worker selection, which makes it more convenient to add more elegant load balance strategies.
@wsry, @gaoyunhaii Could you please help review the code when having time? Thanks.
Motivation
Currently, the load balance strategy is pretty simple, DataPartitions are evenly distributed to all ShuffleWorkers. A better way is to allocate shuffle resources based on the actual capacity of ShuffleWorker.
Changes
Allocate shuffle resources based on the actual capacity of ShuffleWorker. Each ShuffleWorker can also select disk based on disk throughput.
Test