Open Jiefei-Wang opened 2 years ago
Can you give some more examples of how the different balancers work? Suppose we have 10 tasks numbered 1 through 10, and there are 3 workers, labeled A, B, and C (deliberately chosen to have a non-integer ratio). Can you show how the tasks will be assigned by sequential and stepwise in these cases? For random, does the balancer ensure that an approximately equal number of tasks are sent to each worker, or does it randomly select a worker for each task independently of other tasks?
Sure, for the sequential balancer, the task dispatching plan is
A: 1, 2, 3
B: 4, 5, 6
C: 7, 8, 9, 10
For the stepwise balancer, it is
A: 1, 4, 7, 10
B: 2, 5, 8
C: 3, 6, 9
The random balancer will randomly create three sets of tasks, with the cardinality 3, 3, and 4 respectively.
The stepwise balancer performs well in this circumstance because of how the computation scales with task number. But doesn't the 'random' balancer have lower expected evaluation time, in as much as we don't know the distribution of task evaluation times?
Yes, the random balancers have the lowest expected evaluation time, but the highest variance(when you redo the same apply function many times).
If we do not know the task evaluation times in advance, the performance of the stepwise and random balancers should be comparable in most cases. It is more like a tradeoff between expectation and variance. I'm not a fan of randomization, so I choose the stepwise balancer as the default balancer, but I keep the random balancer as an option here just in case the user knows the stepwise balancer will suffer in his apply function.
A couple of points:
Do the balancers only work for bplapply
? If they're available for other functions, then I think the current name of the option is misleading and should be changed.
Unless exact backward compatibility is desired for the sequential balancer, it would probably be better to dispatch more tasks to the first workers, e.g.:
A: 1, 2, 3, 4
B: 5, 6, 7
C: 8, 9, 10
This is because the first worker is usually the one that starts running at the earliest time, so giving it more tasks will tend to make everything finish slightly faster on average. As for the stepwise balancer, it appears that it already has this behavior. For the random balancer, you could sort the task lists in descending size order before dispatching the workers in order to get this behavior.
How easy is it to implement new balancers? Because I have a case where each worker takes so long to spin up relative to the runtime of each job that I actually want to give the first workers much longer lists of jobs than the last ones, e.g. something like:
A: 1, 2, 3, 4, 5, 6
B: 7, 8, 9
C: 10
This would get me closest to having all three workers finish at the same time, this minimizing total walltime for the computation. I realize this is a somewhat rare case, so I don't know if I would want to ask for this to be implemented in the package, but it would be nice if I could write my own custom balancer by e.g. providing a custom partitioning function or writing a class.
Thanks.
I suppose that if the distribution of task evaluation times is independent of task order, then really any balancer has the same expected time?
I suppose (??) that the next most likely is that task evaluation times are ordered (from low to high, or high to low), perhaps not intentionally? And then what is the optimal evaluation order? If I had seven tasks 1:7 taking 1:7 seconds, and 4 workers, then I would like to assign worker:task as 1:7, 2:1, 6; 3: 2, 5; 4: 3:4 would be optimal. But I don't think any of the balancer satisfy that?
In my point 3 above, I'm assuming each task takes an equal amount of time. The reason for wanting to assign different numbers of tasks to each worker is that each worker takes so long to get started that by the time worker C has started, worker A has already been running long enough to run 4 or 5 tasks. For example, imagine that starting a worker takes 1 minute and each task takes 30 seconds to run.
Hello @DarwinAwardWinner , for your comments
bplapply
and bpiterate
. Since the other apply functions depend on these two functions, they will also inherit the balancer from bplapply
or bpiterate
.bplapply
. I will just briefly introduce the balancer for the bplapply
here and you can find all implementations in R/balancer.R
. The balancer requires a generator function. It takes the number of elements of X
and BPPARAM
as input and return a list of two functions record
and next_task
. Here is a scratch for the balancermybalancer <- function(n, BPPARAM) {
list(
record = function(node, task_id, time) {
## record the task execution time
## can be empty
},
next_task = function() {
## return the next task
## task_id: an integer id used to identify the task
## index: the index of the vector `X`
list(
task_id = task_id,
index = index
)
}
)
}
During the parallel evaluation, the function next_task
will be called to determine the task for each worker. For example, if we have X=runif(10)
and 3 workers, the first call to next_task
might return list(task_id=1, index=1:4)
. The second returns list(task_id=2, index=5:7)
and the last returns list(task_id=3, index=8:10)
. This will give the desired load balancing for your point 2.
Once you have defined mybalancer
, you can set the balancer via bpoptions
, for example
opt <- bpoptions(lapplyBalancer = mybalancer)
bplapply(1:4, function(x)Sys.sleep(x), BPPARAM = p, BPOPTIONS = opt)
This can let bplapply
use your customized balancer instead of its build-in balancer.
For @mtmorgan 's comment, I think if we know the task evaluation time in advance, we can provide a customized balancer to reach the optimal performance. It is not very hard to implement it. I plan to add a vignette to give a formal introduction to the balancer along with the other advanced features we have added recently.
I made a mistake in my previous comment. If we have 10 tasks and 3 workers, the actual task sizes are 4, 4, 2. I think this is better than 3,3,4 as the former makes all workers to do more tasks and the latter only give more tasks to one worker(Imagine we have 109 tasks and 10 workers, one worker will have 19 tasks). There is no need to update the commit.
Hello Martin, I wonder if you can merge this pull request. It looks like we have some new feature requests these days.
This pull request enables the load balancer in the apply function.
There are three build-in balancers for
bplapply
, namely "sequential", "stepwise", and "random". the sequential balancer is the balancer used in the master branch. However, I changed the default balancer to the stepwise balancer in this branch.The stepwise balancer sends the 1st element of
X
to the 1st worker, 2nd to the 2nd worker, and so on down to the last worker. Then it started again, sending the next element ofX
to the 1st worker and so on. The cost of the stepwise balancer is marginal and the performance is better than the sequential balancer. Here is an example