SLURM: multiple jobs - Githubissues

sguizard commented 5 years ago

Dear Henrik,

I'm giving a try to dofuture to parallelize a genotyping chip analysis. The idea is to convert a foreach() %dopar% loop {} to submit jobs on our SLURM cluster.

Before going with the real script, I tried this example:

library("doFuture")
registerDoFuture()
library("future.batchtools")
plan(batchtools_slurm)

mu <- 1.0
sigma <- 2.0
x <- foreach(i = 1:200, .export = c("mu", "sigma")) %dopar% {
  rnorm(i, mean = mu, sd = sigma)
}

The associated batchtools file :

#!/bin/bash

<%
# relative paths are not handled well by Slurm
log.file = fs::path_expand(log.file)
-%>

#SBATCH --job-name=<%= job.name %>
#SBATCH --output=<%= log.file %>
#SBATCH --error=<%= log.file %>
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --partition='Main'

Rscript -e 'library(batchtools); batchtools::doJobCollection("<%= uri %>")'

Jobs are correctly submitted to SLURM, but only one by one. I tried to use the availbleCores.custom and availbleCores.methods options and set 150 cores, but the behavior didn't change.

Are there other parameters to configure to allow multiple jobs submission?

Thanks in advance for your answer.

Sébastien

HenrikBengtsson commented 5 years ago

Hi, there are multiple interpretations of your questions. It's not clear to me exactly how you want it to work, but I can say this:

You can limit the number of workers future.batchtools sees by using plan(batchtools_slurm, workers=20). The default is workers=+Inf. This will cause foreach() to chunk up the elements into 20 jobs (=futures) with 200/20=10 elements per job (=per future).
See Section 'Load balancing ("chunking")' in help("doFuture"). You can for instance pass argument .options.future = list(scheduling = 1/4) to foreach() such that 4 elements are processed per job (=future) on average. You can also specify this as .options.future = list(chunk.size = 4L) [I just realized this option was not documented; will be in the next release. there was also a stray bullet injected]
There is currently no way to have future.batchtools to produce array jobs (not sure if that's what you're asking for).
future::availbleCores() does not come into play when using future.batchtools on an job scheduler. That function is only in use when you parallelize on a machine; you distribute jobs to multiple machines.

sguizard commented 5 years ago

Hi, thank you very much for your answers, it's very instructive ! :) But my problem is still here. I’m gonna try to give a better explanation.

Let’s consider the next piece of code:

library("doFuture")
registerDoFuture()
library("future.batchtools")
plan(batchtools_slurm, workers=20)

mu <- 1.0
sigma <- 2.0
x <- foreach(i = 1:200, .export = c("mu", "sigma")) %dopar% {
  rnorm(i, mean = mu, sd = sigma)
}

As you explain in 1., it will generate 10 futures that will be submitted to the cluster (>100 CPUs) with SLURM.

The behavior I’m expecting is that the 10 futures runs in same time on the cluster.

But, in my case, the futures are submitted sequentially (job 1 is submitted, job 1 finished, job 2 is submitted, job 2 finished, job 3 is submitted…).

I do not understand why the future do not run in parallel. I’m quite sure that I missed a detail but I can't find which one.

Thanks again for your help.

sguizard commented 5 years ago

After a second reading the 'Load Balancing section' of the help, I found the "missing detail" :)

I added .options.future = list(scheduling = TRUE) line to the code and got a parallel execution of futures.

library("doFuture")
registerDoFuture()
library("future.batchtools")
# workers=150 for filling partition
plan(batchtools_slurm, workers=150)
.options.future = list(scheduling = TRUE)

mu    <- 1.0
sigma <- 2.0
# Increase computation load to get longer jobs
x <- foreach(i = 1:1000000, .export = c("mu", "sigma")) %dopar% {
  rnorm(i, mean = mu, sd = sigma)
}

Thanks again for help and for your work !

HenrikBengtsson commented 5 years ago

Nah, not like that. That is an argument to the forach() function, e.g.

x <- foreach(..., .options.future = list(chunk.size = 10)) %dopar% { ... }

The way you used it should not make a difference.

If you specify "chunking" via .options.future = list(chunk.size = 10), then you can skip specifying workers in your plan.

HenrikBengtsson / doFuture

SLURM: multiple jobs #37