Bioconductor / DelayedArray

A unified framework for working transparently with on-disk and in-memory array-like datasets
https://bioconductor.org/packages/DelayedArray
24 stars 9 forks source link

Unnamed arguments in blockApply breaks in BatchtoolsParam #72

Closed LTLA closed 3 years ago

LTLA commented 3 years ago
library(TENxBrainData)
sce <- TENxBrainData()

library(BiocParallel)
BPPARAM <- BatchtoolsParam(5, resources = list(walltime = 3600, memory = 8000, ncpus = 1), cluster = "slurm")
blockApply(counts(sce), FUN=function(x) { write(1, file=tempfile(tmpdir="dumpster")) }, BPPARAM=BPPARAM)

The value of FUN doesn't really matter, but anyway, this is what I get:

## Error in batchtools::batchMap(fun = FUN, X, more.args = list(...), reg = registry) :
##  Assertion on 'more.args' failed: Must have names.

It seems that the calls to viewportApply and bplapply2 need to name their non-... arguments. I'm looking at it now and this is a bit of a headache because FUN and verbose are re-used so many times. Maybe we need .FUN and ..FUN, etc.

hpages commented 3 years ago

Any chance you can provide a reproducible example that doesn't require slurm? Thx

LTLA commented 3 years ago

If you delete the cluster="slurm", it should work on any machine, at least up to the point of the error.

hpages commented 3 years ago

Thanks!

Looks more like an issue with bplapply#ANY#BatchtoolsParam to me e.g. this works:

lapply(1:4, function(i, shift) {i + shift}, 10)
# [[1]]
# [1] 11
# 
# [[2]]
# [1] 12
# 
# [[3]]
# [1] 13
# 
# [[4]]
# [1] 14

as well as this:

library(BiocParallel)
BPPARAM <- MulticoreParam(workers=2)
BiocParallel::bplapply(1:4, function(i, shift) {i + shift}, 10, BPPARAM=BPPARAM)
# [[1]]
# [1] 11
# 
# [[2]]
# [1] 12
# 
# [[3]]
# [1] 13
# 
# [[4]]
# [1] 14

but not this:

BPPARAM <- BatchtoolsParam(5, resources = list(walltime = 3600, memory = 8000, ncpus = 1))
BiocParallel::bplapply(1:4, function(i, shift) {i + shift}, 10, BPPARAM=BPPARAM)
# Error in batchtools::batchMap(fun = FUN, X, more.args = list(...), reg = registry) : 
#   Assertion on 'more.args' failed: Must have names.

Yes using a named arg makes it work:

BiocParallel::bplapply(1:4, function(i, shift) {i + shift}, shift=10, BPPARAM=BPPARAM) 
# [[1]]
# [1] 11
# 
# [[2]]
# [1] 12
# 
# [[3]]
# [1] 13
# 
# [[4]]
# [1] 14

but still...

LTLA commented 3 years ago

I was also thinking about that when deciding where to post this issue. The other approach would be for BiocParallel to do some match.call stuff to set the argument names before passing them onto Batchtools.

hpages commented 3 years ago

Yes maybe. Whatever will be needed to work around questionable design decisions from the batchtools folks:

library(batchtools)
tmp <- makeRegistry(file.dir=NA, make.default=FALSE)
batchtools::batchMap(function(i, shift) {i + shift}, x=1:4, more.args=list(100), reg=tmp)
# Error in batchtools::batchMap(function(i, shift) { : 
#   Assertion on 'more.args' failed: Must have names.

The call to assertList(more.args, names = "strict") in batchtools::batchMap() seems completely unnecessary.

Sad!

hpages commented 3 years ago

Closing this (it's a BiocParallel/batchtools issue).

LTLA commented 3 years ago

FWIW I think you can transfer issues between repos in the same organization.

hpages commented 3 years ago

Would require some edits though e.g. change of title, reproducible example that doesn't involve DelayedArray, etc... Probably easier to file a new issue under BiocParallel. Want to do it?

LTLA commented 3 years ago

I suppose I could.