HenrikBengtsson / future.batchtools

:rocket: R package future.batchtools: A Future API for Parallel and Distributed Processing using batchtools
https://future.batchtools.futureverse.org
84 stars 9 forks source link

config for batchtools_sge? #26

Open nick-youngblut opened 6 years ago

nick-youngblut commented 6 years ago

Sorry if this is in the docs and I can't find it, but is there a way to specify default resources for the template? When just using batchtools, default resources can be set with a ~/.batchtools.conf.R file. However, this file doesn't seem to work with future.batchtools::plan().

wlandau commented 6 years ago

A couple options I use:

future::plan(future.batchtools::batchtools_sge, template = "sge-simple.tmpl") seems to work this way.

nick-youngblut commented 6 years ago

How do you provide defaults for the variables in the template file? I'm using a template that includes activating a conda environment:

$ cat ~/.batchtools.sge.tmpl
#!/bin/bash

## The name of the job, can be anything, simply used when displaying the list of running jobs
#$ -N <%= job.name %>

## Combining output/error messages into one file
#$ -j y

## Giving the name of the output log file
#$ -o <%= log.file %>

## One needs to tell the queue system to use the current directory as the working directory
## Or else the script may fail as it will execute in your top level home directory /home/username
#$ -cwd

## Use environment variables
#$ -V

## time
#$ -l h_rt=<%= resources$h_rt %>

## memory
#$ -l h_vmem=<%= resources$h_vmem %>

export PATH=<%= resources$conda.path %>:$PATH
source activate <%= resources$conda.env %>

## Export value of DEBUGME environemnt var to slave
export DEBUGME=<%= Sys.getenv("DEBUGME") %>

<%= sprintf("export OMP_NUM_THREADS=%i", resources$omp.threads) -%>
<%= sprintf("export OPENBLAS_NUM_THREADS=%i", resources$blas.threads) -%>
<%= sprintf("export MKL_NUM_THREADS=%i", resources$blas.threads) -%>

Rscript -e 'batchtools::doJobCollection("<%= uri %>")'
exit 0

...and I'd like to set defaults for resources$conda.path and resources$conda.env. When just using batchtools, setting resources can be done with a config file:

$ cat ~/.batchtools.conf.R
default.resources = list(h_rt = '00:59:00',
                         h_vmem = '4G',
                         conda.env = "py3",
                         conda.path = "/ebio/abt3_projects/software/miniconda3/bin")
cluster.functions = makeClusterFunctionsSGE(template = "~/.batchtools.tmpl")
temp.dir = "/ebio/abt3_projects/temp_data/"
wlandau commented 5 years ago

From revisiting this section of the README, I think I understand a little more. I am also trying to use the resources argument on SGE. This is my template file sge_batchtools.tmpl:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -o <%= log.file %>
#$ -V
#$ -N <%= job.name %>
#$ -pe smp <%= resources[["slots"]] %>
Rscript -e 'batchtools::doJobCollection("<%= uri %>")'
exit 0

and my script run.R:

library(future.batchtools)
future::plan(batchtools_sge(template = "sge_batchtools.tmpl"))
future(system2("hostname"))

which gives an error:

$ Rscript run.R
Loading required package: future
Error: Fatal error occurred: 101. Command 'qsub' produced exit code 2. Output: 'Unable to read script file because of error: ERROR! -pe option must have range as 2nd argument'
Execution halted

But when I replace <%= resources[["slots"]] %> with 2 in sge_batchtools.tmpl, Rscript run.R submits one job with two slots as desired.

Related: https://github.com/HenrikBengtsson/future/issues/181, https://github.com/HenrikBengtsson/future/issues/263, https://github.com/ropensci/drake/issues/169.

HenrikBengtsson commented 5 years ago

Don't know SGE well enough, so I could be wrong, but I think you wanna specify parallel environment "smp" (symmetric multiprocessing) as in -pe smp 2.

https://github.com/BIMSBbioinfo/intro2UnixandSGE/blob/master/sun_grid_engine_for_beginners/how_to_submit_a_job_using_qsub.md

HenrikBengtsson commented 5 years ago

My bad - I somehow missed that you do indeed specify smp - I should go will to sleep now.

wlandau commented 5 years ago

Found the problem in https://github.com/HenrikBengtsson/future.batchtools/issues/26#issuecomment-445371561: my run.R script did not actually set the slots element of resources. This worked for me:

library(future.batchtools)
future::plan(batchtools_sge(template = "sge_batchtools.tmpl"))
future(system2("hostname"), resources = list(slots = 2))

As desired, I saw a short-lived job with 2 slots on the cluster.

nick-youngblut commented 5 years ago

At least with the configuration that I have list above, I get no output from failed jobs. Moreover, it's not clear where the qsub job log file is, given that it's just set as <%= log.file %> in the *.tmpl file. I also haven't found any documentation about how best to troubleshoot failed qsub jobs (eg., AFAK, there's no getLog() for future.batchtools and batchtools::getLog() doesn't work with future.batchtools jobs).

Is there a good way to troubleshoot failed jobs? Preferably, I would like a function to print the stderr/stdout from each job and the qacct -j JOBID info. I really like using future.batchtools + future.apply, but it's always a pain to troubleshoot failed jobs.

nick-youngblut commented 4 years ago

I also haven't found any documentation about how best to troubleshoot failed qsub jobs (eg., AFAK, there's no getLog() for future.batchtools and batchtools::getLog() doesn't work with future.batchtools jobs).

Still a problem.

Also, it's not clear what variables are available in the template. I know of job.name, log.file, and resources, but are there any others? If so, is there documentation on this?

HenrikBengtsson commented 4 years ago

I'd like to redirect this question/ask/request to the batchtools package. I agree that {future.batchtools} might be able to improve it's documentation on this but I want to minimize any type of redundacy here and thereby the risk of falling out of sync with {batchtools}; {batchtools} is in charge on how things work below the future layer.