Bioconductor / BiocParallel

Bioconductor facilities for parallel evaluation
https://bioconductor.org/packages/BiocParallel
67 stars 29 forks source link

Protect BatchtoolsParam from unnamed arguments #123

Closed LTLA closed 4 years ago

LTLA commented 4 years ago

Migrated from Bioconductor/DelayedArray#72.

Say we have the following:

library(BiocParallel)
BPPARAM <- MulticoreParam(workers=2)
BiocParallel::bplapply(1:4, function(i, shift) {i + shift}, 10, BPPARAM=BPPARAM)
## [[1]]
## [1] 11
## 
## [[2]]
## [1] 12
## 
## [[3]]
## [1] 13
## 
## [[4]]
## [1] 14

But switching over to BatchtoolsParam fails:

BPPARAM <- BatchtoolsParam(5, resources = list(walltime = 3600, memory = 8000, ncpus = 1))
BiocParallel::bplapply(1:4, function(i, shift) {i + shift}, 10, BPPARAM=BPPARAM)
## Error in batchtools::batchMap(fun = FUN, X, more.args = list(...), reg = registry) : 
##   Assertion on 'more.args' failed: Must have names.

For whatever reason, the batchtools authors decided that all arguments must be named, hence the assertion fail.

Short of convincing batchtools to drop this requirement, it seems like bplapply and related functions could protect users by checking for unnamed arguments and slapping names onto them according to their position. Probably something like:

FUN <- function(i, shift) {i + shift}
thingy <- runif(1000)
fun.call <- match.call(FUN, call("FUN", thingy, 10))
named.args <- as.list(fun.call)[-1]
mtmorgan commented 4 years ago

My feeling is that I would not make this change; I don't know what the rationale of the batchtools authors is, but I presume they know best for their use case.

LTLA commented 4 years ago

Sure, but the problem is that BiocParallel clients cannot use bplapply and friends with BatchtoolsParam without ensuring that all arguments in ... are explicitly named. This was not, as far as I know, part of the bplapply (or even lapply) contract.

In my packages, most of my arguments are named, but this is not always the case, e.g., DelayedArray does not name its arguments. This has stymied a few recent attempts to parallelize some large analyses via Slurm.

mtmorgan commented 4 years ago

Just to be sure we're on the same page, the problem is that you have not (yet) named args, not that there were technical reasons that you could not? I'm still not inclined to make a change.

LTLA commented 4 years ago

All my own functions have named arguments. The actual problem is that DelayedArray's call stack for blockApply() does not, because it re-uses the FUN argument multiple times for different things. I raised this in the linked issue and it was punted here.

mtmorgan commented 4 years ago

Fixed upstream (thanks @mllg !)