Open yonicd opened 5 years ago
is this related to this issue? https://github.com/HenrikBengtsson/future.batchtools/issues/18
Not 100% sure I understand what you're aiming for; it could be interpreted in a few different ways. Forgetting about your job scheduler for a while, assume you have two compute nodes n1
and n2
, do you want n1
to process 8 of the elements and n2
the remaining 8? Something like:
workers <- rep(c("n1", "n2"), each = 8L)
plan(cluster, workers = workers)
x <- vector("list", length = 16L)
res <- future.apply::future_lapply(x, FUN = foo, future.chunk.size = 1L)
That will create 16 futures, one per element in x
, and evaluate them on 8 workers per machine (two compute nodes).
... or are you looking for nested parallelization?
Another thing: if you need 8 cores per node on SGE, I recommend resources = list(slots = 8)
.
@wlandau that doesnt give me the right allocation. if i set slots to 8 then each element uses 8 cores. i wanted to load balance all elements across all the cores available.
in this setup i have the right allocation with internal parallelizing
sge <- future::tweak(
future.batchtools::batchtools_sge,
label = 'test2',
template = 'batchtools.sge-mrg1.tmpl',
workers = rep(c("n1", "n2"), each = 8L),
resources = list(slots = 2)
)
future::plan(list(sge,future::multiprocess))
x <- furrr::future_map(1:16,.f = function(x){
system.time({furrr::future_map(1:8,.f=function(y) Sys.sleep(10))})
})
qstat -f
queuename qtype resv/used/tot. load_avg arch states
---------------------------------------------------------------------------------
all.q@ip-....us-west- BIP 0/8/8 0.35 lx-amd64
737 0.55500 test2 yonis r 02/27/2019 14:53:44 2
739 0.55500 test2 yonis r 02/27/2019 14:53:44 2
741 0.55500 test2 yonis r 02/27/2019 14:53:44 2
743 0.55500 test2 yonis r 02/27/2019 14:53:44 2
---------------------------------------------------------------------------------
all.q@ip-....us-west-2. BIP 0/8/8 0.30 lx-amd64
736 0.55500 test2 yonis r 02/27/2019 14:53:44 2
738 0.55500 test2 yonis r 02/27/2019 14:53:44 2
740 0.55500 test2 yonis r 02/27/2019 14:53:44 2
742 0.55500 test2 yonis r 02/27/2019 14:53:44 2
############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
744 0.55500 test2 yonis qw 02/27/2019 14:53:40 2
745 0.55500 test2 yonis qw 02/27/2019 14:53:41 2
746 0.55500 test2 yonis qw 02/27/2019 14:53:42 2
747 0.55500 test2 yonis qw 02/27/2019 14:53:43 2
748 0.00000 test2 yonis qw 02/27/2019 14:53:44 2
749 0.00000 test2 yonis qw 02/27/2019 14:53:46 2
> x
[[1]]
user system elapsed
0.113 0.017 40.185
[[2]]
user system elapsed
0.089 0.010 40.149
[[3]]
user system elapsed
0.105 0.016 40.179
[[4]]
user system elapsed
0.082 0.016 40.157
[[5]]
user system elapsed
0.088 0.016 40.160
[[6]]
user system elapsed
0.076 0.013 40.155
[[7]]
user system elapsed
0.080 0.029 40.165
[[8]]
user system elapsed
0.077 0.013 40.156
[[9]]
user system elapsed
0.112 0.016 40.184
[[10]]
user system elapsed
0.080 0.011 40.148
[[11]]
user system elapsed
0.087 0.014 40.158
[[12]]
user system elapsed
0.102 0.013 40.186
[[13]]
user system elapsed
0.076 0.026 40.158
[[14]]
user system elapsed
0.077 0.013 40.154
[[15]]
user system elapsed
0.095 0.018 40.169
[[16]]
user system elapsed
0.104 0.013 40.179
It's still not clear to me but have a look at future.apply::future_lapply()
and the argument future.scheduling
or alternatively future.chunk.size
, which allows to you control elements per chunk (=elements per future).
i added that argument and got a similar result. thanks for the help!
i added that argument ...
to future_lapply()
? ... because future_map()
don't take "that argument"(?).
Also, https://github.com/cloudyr/googleComputeEngineR/issues/129#issuecomment-467897422 might help/clarify what can be done with nested-levels of workers.
x <- furrr::future_map(1:16,.f = function(x){
system.time({furrr::future_map(1:8,.f=function(y) Sys.sleep(10))})
},.options = furrr::future_options(scheduling = 1))
Note that scheduling = 1
is the default, i.e. you haven't changed anything - all load balancing is done at the first layer. Try with future_lapply()
and the future.chunk.size
option, which might be easier to grasp. When you confirmed that you get what you want there, you can translate it one-to-one to use future.scheduling
(see the help page). Then go back to furrr.
is it possible to allocate slots on a heterogeneous cluster?
for example if i have 2 workers one with 4 cores and another with 8, is there a way to set something along the lines of
workers=rep(c('n1','n2'),c(4,8))
resources(slots=c(2,4))
This would allow me to send jobs to workers while controlling what resources are set for nested parallelization
i tried using future_apply and the analogous furrr::future_map. i think i understand now how to better control the chunk sizes.
i am confused though on the solution in you gave in the link above nesting two future_lapply calls to cause jobs to load balanced on multiple workers.
sge <- future::tweak(
future.batchtools::batchtools_sge,
label = 'pred',
template = 'batchtools.sge-mrg.tmpl',
workers = rep(sprintf('n%s',seq_len(10)), each = 8L)
)
future::plan(list(sge))
future_sim <- future_lapply(1:10000, FUN = function(x) {
future_lapply(x, FUN = slow_func)
}, future.chunk.size = 1)
if i do this then sge fills up all the cores properly have jobs waiting in queue
but if i do
future_sim <- future_lapply(1:10000, FUN = slow_func, future.chunk.size = 1)
the jobs seemed to be throttled and not be submitted at once into the queue
what is happening in the nested version?
For clarity, how familiar are you with HPC scheduling outside of R? Knowing that might help me address your questions. The reason why I'm asking is that you very rarely want to specify what compute nodes where your jobs should run, but it could also be that this is the model used on your cluster. Although that is not how it is used, it looks like you are attempting to do exactly that with workers = rep(sprintf('n%s',seq_len(10))
, each = 8L)`. Note that my comment in https://github.com/HenrikBengtsson/future.batchtools/issues/39#issuecomment-467870757 was mean to clarify your objectives and it explicitly meant to skip SGE until the problem was understood.
I'd argue that the only reason you should specify argument workers
for batchtools_sge()
is to control how many concurrently running jobs, i.e. specify an integer. The default is workers = +Inf
, which makes batchtools_sge
think there's an infinite number of workers. This with cause nbrOfWorkers()
to return +Inf
. nbrOfWorkers()
is important, because it affects how future_lapply(X, ...)
and friends chunks up the elements. If nbrOfWorkers() == 1
, then all elements in X
will be processed by a single job, i.e. lapply(X, ...)
. If nbrOfWorkers() == 2
, the there will be two chunks effectively doing lapply(X[1:(n/2)], ...)
and lapply(X[((n/2)+1):n], ...)
, and so on. With nbrOfWorkers == +Inf
, there will be n = length(X)
chunks each processing a single element. So, by setting workers=2
for batchtools_sge
as you did in your original comment, you'll get two chunks, i.e. two jobs will be used to process your data.
So, I'm still don't understand what you want to achieve. If you could explain how many jobs your want to use to process your data, that would help. If your objective is that you want to use exactly 8 cores per job, then your objective is to split up your X
in length(X) / 8
jobs. But it's a bit unclear what you want because you're keep changing your questions (e.g. https://github.com/HenrikBengtsson/future.batchtools/issues/39#issuecomment-467909711) without following up on my request to clarify - it makes it hard to help you. It could be that we're talking past each other. Please try to explain verbatim with a single example how and where you want your X
to be processed, and I try to explain how to do that, if it is possible.
sorry for the confusion.
I am not well versed in HPC terminology. I am working with an elastic cluster (from my understanding workers spin up as needed and scale back down when not).
My use of "workers" was from patching together different issues answers trying to understand how to tweak a template, your explanation above cleared up a lot of my confusion how jobs are allocated.
Thank you for your patience and the thorough explanations!
I am not well versed in HPC terminology. I am working with an elastic cluster (from my understanding workers spin up as needed and scale back down when not).
Oh... I never worked with "elastic clusters" - maybe they and/or your needs requires additional features in the future framework and/or the batchtools package. If you learn something else or manage to narrow down exactly what you need, please follow up.
I am using an sge template.
Is there a native way in future plan to have elements of a list be allocated to each available core in preset number of workers (ie chunking the list)?
ie if i have a workers with 8 cores and a vector (x) of length 16, i want future to send each element to a core on the workers to run in parallel. (assuming i am sending out jobs from the remote master)
cc @wlandau