Bioconductor / S4Vectors

Foundation of vector-like and list-like containers in Bioconductor
https://bioconductor.org/packages/S4Vectors
18 stars 20 forks source link

[,DataFrame method not found with SnowParam parallel processing #126

Open jorainer opened 4 days ago

jorainer commented 4 days ago

Dear all, I stumbled over this problem: subsetting a DataFrame within bplapply that uses SnowParam results in this error: Error in x[1, ]: object of type 'S4' is not subsettable. To reproduce:

library(S4Vectors)
library(BiocParallel)

## [ fails with SnowParam
d <- DataFrame(a = 1:4, b = "a")
l <- list(d, d, d, d)

lapply(l, function(x) x[1, ])
bplapply(l, function(x) x[1, ], BPPARAM = MulticoreParam(2))
bplapply(l, function(x) x[1, ], BPPARAM = SnowParam(2))

The last call results in

Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, aperm, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, Filter,
    Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
    pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which.max,
    which.min

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:utils’:

    findMatches

The following objects are masked from ‘package:base’:

    expand.grid, I, unname

Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, aperm, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, Filter,
    Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
    pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which.max,
    which.min

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:utils’:

    findMatches

The following objects are masked from ‘package:base’:

    expand.grid, I, unname

Error: BiocParallel errors
  2 remote errors, element index: 1, 3
  2 unevaluated and other errors
  first remote error:
Error in x[1, ]: object of type 'S4' is not subsettable

This can be fixed with:

bplapply(l, function(x) {
    requireNamespace("S4Vectors", quietly = TRUE)
    x[1, ]
}, BPPARAM = SnowParam(2))

could it be that there is some NAMESPACE issue regarding the [ method @hpages ?

This is with current devel version (R 4.4.1, S4Vectors 0.43.2), but I got the same version for the BioC 3.18 and 3.19 versions.

mtmorgan commented 4 days ago

This has until relatively recently been the case for parallel evaluation via SNOW -- it is a separate R process so has to have packages etc. loaded explicitly. Obviously this can entail substantial 'start up' time, reducing the value of parallel evaluation; it may be worth-while to start the cluster once (via bpparam = bpstart(SnowParam())) and then re-use bpparm across calls before calling bpstop(bpparam); I think this is discussed in section 4.1.2 of the vignette and elsewhere.

@Jiefei-Wang has introduced functionality that is supposed to automate this process to some extent, so perhaps this particular case (S4 method on[) is not handled properly. But that should be discussed in an issue in BiocParallel.

jorainer commented 3 days ago

Thanks Martin for the explanation!