Open nick-youngblut opened 6 years ago
A couple options I use:
future::plan(future.batchtools::batchtools_sge, template = "sge-simple.tmpl")
seems to work this way.
How do you provide defaults for the variables in the template file? I'm using a template that includes activating a conda environment:
$ cat ~/.batchtools.sge.tmpl
#!/bin/bash
## The name of the job, can be anything, simply used when displaying the list of running jobs
#$ -N <%= job.name %>
## Combining output/error messages into one file
#$ -j y
## Giving the name of the output log file
#$ -o <%= log.file %>
## One needs to tell the queue system to use the current directory as the working directory
## Or else the script may fail as it will execute in your top level home directory /home/username
#$ -cwd
## Use environment variables
#$ -V
## time
#$ -l h_rt=<%= resources$h_rt %>
## memory
#$ -l h_vmem=<%= resources$h_vmem %>
export PATH=<%= resources$conda.path %>:$PATH
source activate <%= resources$conda.env %>
## Export value of DEBUGME environemnt var to slave
export DEBUGME=<%= Sys.getenv("DEBUGME") %>
<%= sprintf("export OMP_NUM_THREADS=%i", resources$omp.threads) -%>
<%= sprintf("export OPENBLAS_NUM_THREADS=%i", resources$blas.threads) -%>
<%= sprintf("export MKL_NUM_THREADS=%i", resources$blas.threads) -%>
Rscript -e 'batchtools::doJobCollection("<%= uri %>")'
exit 0
...and I'd like to set defaults for resources$conda.path
and resources$conda.env
. When just using batchtools, setting resources can be done with a config file:
$ cat ~/.batchtools.conf.R
default.resources = list(h_rt = '00:59:00',
h_vmem = '4G',
conda.env = "py3",
conda.path = "/ebio/abt3_projects/software/miniconda3/bin")
cluster.functions = makeClusterFunctionsSGE(template = "~/.batchtools.tmpl")
temp.dir = "/ebio/abt3_projects/temp_data/"
From revisiting this section of the README, I think I understand a little more. I am also trying to use the resources
argument on SGE. This is my template file sge_batchtools.tmpl
:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -o <%= log.file %>
#$ -V
#$ -N <%= job.name %>
#$ -pe smp <%= resources[["slots"]] %>
Rscript -e 'batchtools::doJobCollection("<%= uri %>")'
exit 0
and my script run.R
:
library(future.batchtools)
future::plan(batchtools_sge(template = "sge_batchtools.tmpl"))
future(system2("hostname"))
which gives an error:
$ Rscript run.R
Loading required package: future
Error: Fatal error occurred: 101. Command 'qsub' produced exit code 2. Output: 'Unable to read script file because of error: ERROR! -pe option must have range as 2nd argument'
Execution halted
But when I replace <%= resources[["slots"]] %>
with 2
in sge_batchtools.tmpl
, Rscript run.R
submits one job with two slots as desired.
Related: https://github.com/HenrikBengtsson/future/issues/181, https://github.com/HenrikBengtsson/future/issues/263, https://github.com/ropensci/drake/issues/169.
Don't know SGE well enough, so I could be wrong, but I think you wanna specify parallel environment "smp" (symmetric multiprocessing) as in -pe smp 2
.
My bad - I somehow missed that you do indeed specify smp
- I should go will to sleep now.
Found the problem in https://github.com/HenrikBengtsson/future.batchtools/issues/26#issuecomment-445371561: my run.R
script did not actually set the slots
element of resources
. This worked for me:
library(future.batchtools)
future::plan(batchtools_sge(template = "sge_batchtools.tmpl"))
future(system2("hostname"), resources = list(slots = 2))
As desired, I saw a short-lived job with 2 slots on the cluster.
At least with the configuration that I have list above, I get no output from failed jobs. Moreover, it's not clear where the qsub job log file is, given that it's just set as <%= log.file %>
in the *.tmpl file. I also haven't found any documentation about how best to troubleshoot failed qsub jobs (eg., AFAK, there's no getLog()
for future.batchtools and batchtools::getLog()
doesn't work with future.batchtools jobs).
Is there a good way to troubleshoot failed jobs? Preferably, I would like a function to print the stderr/stdout from each job and the qacct -j JOBID
info. I really like using future.batchtools + future.apply, but it's always a pain to troubleshoot failed jobs.
I also haven't found any documentation about how best to troubleshoot failed qsub jobs (eg., AFAK, there's no getLog() for future.batchtools and batchtools::getLog() doesn't work with future.batchtools jobs).
Still a problem.
Also, it's not clear what variables are available in the template. I know of job.name
, log.file
, and resources
, but are there any others? If so, is there documentation on this?
I'd like to redirect this question/ask/request to the batchtools package. I agree that {future.batchtools} might be able to improve it's documentation on this but I want to minimize any type of redundacy here and thereby the risk of falling out of sync with {batchtools}; {batchtools} is in charge on how things work below the future layer.
Sorry if this is in the docs and I can't find it, but is there a way to specify default resources for the template? When just using
batchtools
, default resources can be set with a~/.batchtools.conf.R
file. However, this file doesn't seem to work withfuture.batchtools::plan()
.