Open epruesse opened 4 years ago
If I understand correctly, plan(tweak(multicore, workers=8)) means that the first nesting level gets 8 parallel threads and the second nesting level gets no parallelism. I could hard-allocate threads to each level, but that's hard to do since it means I have to know all thread usages down the tree of packages.
Correct x 2.
What I'm looking for is a "worker pool" like implementation. So that if I have a loop of three calling a package that has uses
future.apply
on a huge vector but takes very long to even get there, the NN workers can be busy for as much of the time as possible.
I'm not sure I fully understand, but I can guess what you're after. Basically, if you do:
a <- future_lapply(x, function(y) {
future_lapply(y, function(z)) {
...
})
})
you want the inner and the outer "loops" to be able to pull from the same pool of "workers", correct?
This is available if you use an external job scheduler such as those available in HPC environment. Then you could use:
plan(list(outer = batchtools_slurm, inner = batchtools_slurm))
Both layers will submit their jobs (=futures) to the same job queue and it's up to the job scheduler to allocate resources as they get available.
Try to implement something similar in R is tedious but should be doable. Maybe one could build upon Gábor Csárdi's work in Multi Process Task Queue in 100 Lines of R Code, 2019-09-09. But, point is, this is not really something that should be implemented in the future package. Instead, it should/could be added asa new type of backend that futures can rely on - think:
library(future.taskqueue)
plan(list(outer=taskqueue, inner=taskqueue))
...
The future.tests package can be used to validate that it is properly implemented and meets the requirements of the future framework.
Yes, that's what I meant. Though I was thinking less about nested loops in client code that are known to the user and easily configured with plan(list(...))
, but about the levels hidden in library code. The docs tell package authors to stay away from plan
, so I was initially assuming that there would be some kind of queue dealing with levels of nesting hidden from me.
That would be my main argument for allowing a simple queue scheduler into future
- it's the simplest approach to arrive at "least surprising" behavior. A fully featured scheduler is clearly out of scope. The more packages use future
themselves, though, the more complicated it becomes for the end user to set the right plan
everywhere.
Another argument might be that future
would be the place to place a call that can say something like "use up to 4 threads here". The knowledge what degree of parallelism is beneficial sits within the package (and preferably not in the vignette), and would ideally be hidden from consuming client code.
(I wish I could promise a PR, but it would be easier to promise that I'll never find the time...).
If I understand correctly,
plan(tweak(multicore, workers=8))
means that the first nesting level gets 8 parallel threads and the second nesting level gets no parallelism. I could hard-allocate threads to each level, but that's hard to do since it means I have to know all thread usages down the tree of packages.What I'm looking for is a "worker pool" like implementation. A naive greedy allocation using a semaphore that decrements every time a thread is forked off would be a good start. So that if I have a loop of three calling a package that has uses
future.apply
on a huge vector but takes very long to even get there, the NN workers can be busy for as much of the time as possible.Interaction with in particular OMP is a problem of course. A lot of things seem to use that. IRC, Intel TBB auto-detects the number of "useful" threads to use and adjusts this value as it goes based on system load. Something like this would need extra house keeping, but the concept of "don't start more threads if all my workers/cpus are busy", or even "don't start more threads if we are at XY% memory" would be very useful to robustly run things in parallel.