Closed PeteHaitch closed 5 years ago
nb. I see this problem even on Linux boxes with 384GB of RAM.
It does not appear to be a small-memory issue per se.
@ttriche can you please try DelayedMatrixStats::rowSums2()
and let me know if you still have the issue? It's implemented slightly differently
I use rowSums2() in my code, but the trigger for this behavior seems to be
inside of the dmrseq
package, and I haven't been able to track it down
yet. Running some separate tests to try and figure it out now.
--t
On Tue, Apr 24, 2018 at 1:31 PM, Peter Hickey notifications@github.com wrote:
@ttriche https://github.com/ttriche can you please try DelayedMatrixStats::rowSums2() and let me know if you still have the issue? It's implemented slightly differently
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/DelayedArray/issues/16#issuecomment-384016030, or mute the thread https://github.com/notifications/unsubscribe-auth/AAARIu1AEpx1kOPsQnTpbFp-T4kA7gtfks5tr2GDgaJpZM4TR-hQ .
@PeteHaitch Thanks for looking into this. Should the problem be fixed in hansenlab/bsseq
with the most recent commit?
@ttriche If you try installing the latest bsseq from hansenlab/bsseq
, does dmrseq run properly again?
@kdkorthauer Yes, it should be fixed now in bsseq. However, it's not clear to me if @ttriche's dmrseq example is triggering this issue from within bsseq or elsewhere in dmrseq. I'm happy to help debug this further
@PeteHaitch awesome, thanks! I'll investigate whether the issue persists within dmrseq, and if so whether a similar patch might work.
DelayedArray does this
bplapply(seq_len(nblock),
function(b) {
if (get_verbose_block_processing())
message("Processing block ", b, "/", nblock, " ... ",
appendLF=FALSE)
viewport <- grid[[b]]
block <- extract_block(x, viewport)
if (!is.array(block))
block <- .as_array_or_matrix(block)
attr(block, "from_grid") <- grid
attr(block, "block_id") <- b
block_ans <- FUN(block, ...)
if (get_verbose_block_processing())
message("OK")
block_ans
},
BPREDO=BPREDO,
BPPARAM=BPPARAM
)
where block <- extract_block(x, viewport)
is done on the worker. This means that x
needs to be made available (serialized to, even in the case of MulticoreParam()
) on the worker. A different implementation is to use bpiterate()
and a generator function ITER
to produce blocks on the manager.
ITER <- local({
b <- 0L
function() {
b <<- b + 1L
if (b > nblock)
return(NULL)
if (get_verbose_block_processing())
message("Processing block ", b, "/", nblock, " ... ",
appendLF=FALSE)
viewport <- grid[[b]]
block <- extract_block(x, viewport)
if (!is.array(block))
block <- .as_array_or_matrix(block)
attr(block, "from_grid") <- grid
attr(block, "block_id") <- b
block
}
})
bpiterate(ITER, FUN, BPPARAM = BPPARAM)
This 'works' but is incredibly slow (use set_verbose_block_processing(TRUE)
to convince yourself that it's chugging away) because the chunks are still serialized to each worker, and because the garbage collector is being called often; I'll explore a better solution for the common multicore data transfer problem in BiocParallel. Also I'm not sure where, if x
were something like an HDF5Array, the object is actually realized in memory as a matrix; one would like to do that step on the worker.
I say 'works', but actually on my laptop (after setting BiocParallel::register(BiocParallel::SerialParam())
for better speed) in the rowSums,DelayedArray
method there is
block_results <- blockApply(x, rowSums, na.rm=na.rm)
ans <- rowSums(matrix(unlist(block_results, use.names=FALSE), nrow=nrow(x)))
and on the last line I end up with Error: cannot allocate vector of size 7.5 Gb
-- the original object x
is consuming 3.7G, block_results
consumes 7.5G (these are doubles, rather than ints, which is a little surprising, I would have thought rowSums()
would have returned ints if possible @lawremi), unlist(block_results, use.names=FALSE)
is another 7.5G, and the matrix(...)
is another 7.5G so 3.7G + 3 * 7.5G. At this point the unlist()
result is available for garbage collection (is it?), but then ans
takes its place using another 7.5G! If this has been written as
result <- blockApply(x, rowSums, na.rm=na.rm)
result <- unlist(result, use.names = FALSE)
result <- matrix(result, nrow = nrow(x))
result <- rowSums(result)
there would only ever need to be 3.7G + 2 x 7.5G in memory.
Hmm, but now I'm confused! unlist(blockApply())
is as.vector(x)
(!), which we reshape into a matrix equal to x (except with numeric rather than integer type) and then we calculate rowSums locally! I guess DelayedArray has chosen to block into chunks where each chunk has a single column...
unlist(blockApply())
is as.vector(x)
because blocApply()
is still using the old default block grid where the blocks go "along the columns". This is not optimal in most cases and needs to change. Will do ASAP.
as of today, with R-3.5, bioc-devel, hansenlab/bsseq, Bioconductor/DelayedArray, and the rest, I'm still seeing the following on a 384GB machine with 24 cores:
# first, I need to update old objects:
R> byChr <- function(x) split(x, seqnames(x))
R> byChr(bsseq)[todo]
Error in .check_DelayedArray_internals(x) :
DelayedMatrix object uses internal representation from DelayedArray
< 0.5.11 and cannot be displayed or used. Please update it with:
object <- updateObject(object, verbose=TRUE)
and re-serialize it.
R> todo
[1] "chr22" "chr21" "chr20" "chr19" "chr18" "chr17" "chr16" "chr15" "chr14"
[10] "chr13" "chr12" "chr10" "chr9" "chr8" "chr6" "chr5" "chr3" "chr2"
[19] "chr1"
R> bsseq <- updateObject(bsseq, verbose=TRUE)
updateObject(object="ANY") default for object of class 'matrix'
[updateObject] DelayedMatrix object uses internal representation from
[updateObject] DelayedArray < 0.5.11. Updating it ...
updateObject(object="ANY") default for object of class 'matrix'
[updateObject] DelayedMatrix object uses internal representation from
[updateObject] DelayedArray < 0.5.11. Updating it ...
[updateObject] GRanges object uses internal representation from
[updateObject] GenomicRanges < 1.31.16. Updating it ...
[updateObject] elementType slot of IRanges object should be set to "ANY",
[updateObject] not "integer". Updating it ...
R> byChr(bsseq)[todo]
List of length 19
names(19): chr22 chr21 chr20 chr19 chr18 chr17 ... chr6 chr5 chr3 chr2 chr1
# Then, with the freshly installed packages:
R> DMRs <- lapply(byChr(bsseq)[todo], WGBSeq, testCovariate="tumor")
3135 loci with 0 coverage in at least 1 condition.
Retaining 561544 loci.
Assuming the test covariate tumor is a factor.
Condition: 1 vs 0
Error in serialize(data, node$con, xdr = FALSE) :
error writing to connection
Error: failed to stop 'SOCKcluster' cluster: error writing to connection
So that's a little frustrating, given that it is going from chr22 (smallest) as the first chunk.
R> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /usr/lib64/R/lib/libRblas.so
LAPACK: /usr/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices datasets utils
[8] methods base
other attached packages:
[1] biscuitEater_0.9.11 bsseq_1.15.5
[3] SummarizedExperiment_1.9.18 DelayedArray_0.5.34
[5] BiocParallel_1.13.3 matrixStats_0.53.1
[7] Biobase_2.39.2 GenomicRanges_1.31.23
[9] GenomeInfoDb_1.15.5 IRanges_2.13.28
[11] S4Vectors_0.17.43 BiocGenerics_0.25.3
[13] BiocInstaller_1.29.6 skeletor_1.0.4
[15] magrittr_1.5 gtools_3.5.0
[17] useful_1.2.3 ggplot2_2.2.1
[19] purrr_0.2.4 knitr_1.20
loaded via a namespace (and not attached):
[1] colorspace_1.3-2
[2] XVector_0.19.9
[3] roxygen2_6.0.1
[4] bit64_0.9-7
[5] interactiveDisplayBase_1.17.0
[6] AnnotationDbi_1.41.5
[7] qualV_0.3-3
[8] xml2_1.2.0
[9] splines_3.5.0
[10] codetools_0.2-15
[11] R.methodsS3_1.7.1
[12] impute_1.53.0
[13] dmrseq_0.99.13
[14] Rsamtools_1.31.3
[15] GO.db_3.6.0
[16] R.oo_1.22.0
[17] graph_1.57.1
[18] shiny_1.0.5
[19] HDF5Array_1.7.11
[20] readr_1.1.1
[21] compiler_3.5.0
[22] httr_1.3.1
[23] assertthat_0.2.0
[24] Matrix_1.2-14
[25] lazyeval_0.2.1
[26] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[27] limma_3.35.15
[28] later_0.7.1
[29] htmltools_0.3.6
[30] prettyunits_1.0.2
[31] tools_3.5.0
[32] bindrcpp_0.2.2
[33] gtable_0.2.0
[34] glue_1.2.0
[35] GenomeInfoDbData_1.1.0
[36] annotatr_1.5.10
[37] reshape2_1.4.3
[38] dplyr_0.7.4
[39] doRNG_1.6.6
[40] Rcpp_0.12.16
[41] bumphunter_1.21.0
[42] Biostrings_2.47.12
[43] nlme_3.1-137
[44] rtracklayer_1.39.13
[45] iterators_1.0.9
[46] DelayedMatrixStats_1.1.12
[47] stringr_1.3.0
[48] fastseg_1.25.0
[49] mime_0.5
[50] rngtools_1.2.4
[51] devtools_1.13.5
[52] XML_3.98-1.11
[53] org.Hs.eg.db_3.6.0
[54] AnnotationHub_2.11.4
[55] zlibbioc_1.25.0
[56] scales_0.5.0
[57] BSgenome_1.47.5
[58] hms_0.4.2
[59] promises_1.0.1
[60] RBGL_1.55.1
[61] rhdf5_2.23.8
[62] RColorBrewer_1.1-2
[63] yaml_2.1.18
[64] memoise_1.1.0
[65] pkgmaker_0.22
[66] biomaRt_2.35.13
[67] stringi_1.1.7
[68] RSQLite_2.1.0
[69] foreach_1.4.4
[70] permute_0.9-4
[71] GenomicFeatures_1.31.10
[72] rlang_0.2.0
[73] pkgconfig_2.0.1
[74] commonmark_1.4
[75] bitops_1.0-6
[76] lattice_0.20-35
[77] Rhdf5lib_1.1.6
[78] bindr_0.1.1
[79] GenomicAlignments_1.15.13
[80] bit_1.1-12
[81] plyr_1.8.4
[82] R6_2.2.2
[83] DBI_0.8
[84] pillar_1.2.2
[85] withr_2.1.2
[86] RCurl_1.95-4.10
[87] tibble_1.4.2
[88] KernSmooth_2.23-15
[89] OrganismDbi_1.21.1
[90] HMMcopy_1.21.0
[91] Homo.sapiens_1.3.1
[92] progress_1.1.2
[93] locfit_1.5-9.1
[94] grid_3.5.0
[95] data.table_1.10.4-3
[96] blob_1.1.1
[97] digest_0.6.15
[98] xtable_1.8-2
[99] httpuv_1.4.1
[100] regioneR_1.11.0
[101] outliers_0.14
[102] R.utils_2.6.0
[103] munsell_0.4.3
[104] registry_0.5
Any ideas? It's getting to the point where dmrseq/bsseq is unusable for certain tasks :-(
--t
On Fri, Apr 27, 2018 at 10:47 AM, hpages notifications@github.com wrote:
unlist(blockApply()) is as.vector(x) because blocApply() is still using the old default block grid where the blocks go "along the columns". This is not optimal in most cases and needs to change. Will do ASAP.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/DelayedArray/issues/16#issuecomment-384992830, or mute the thread https://github.com/notifications/unsubscribe-auth/AAARIuVfppHN4aD0aJ0h3Q9bPORLRsrtks5tsy-LgaJpZM4TR-hQ .
nb. biocValid does not like my installation of newer packages:
* Packages too new for Bioconductor version '3.7'
Version
bsseq "1.15.5"
DelayedArray "0.5.34"
S4Vectors "0.17.43"
SummarizedExperiment "1.9.18"
LibPath
bsseq "/home/tim.triche/R/x86_64-redhat-linux-gnu-library/3.5"
DelayedArray "/home/tim.triche/R/x86_64-redhat-linux-gnu-library/3.5"
S4Vectors "/home/tim.triche/R/x86_64-redhat-linux-gnu-library/3.5"
SummarizedExperiment "/home/tim.triche/R/x86_64-redhat-linux-gnu-library/3.5"
downgrade with biocLite(c("bsseq", "DelayedArray", "S4Vectors", "SummarizedExperiment"))
Error: 4 package(s) too new
However, I can't reinstall bsseq from the hansenlab repo without doing this. So... I'm stumped.
Last time I used SerialParam() and mclapply() to get around this, which seems utterly disgusting and wrong. But it did have the benefit of working for some of the chromosomes (3 out of 22, and oddly they were not the small ones -- 4, 7, and 11 succeeded). I suppose I'll try that again...
@ttriche How do I get that bsseq
object? Could it be updated and re-serialized once for all so we remove that part from the equation? Thx!
Hi @ttriche,
Can you let me know what happens when you run the following?
library(DelayedArray)
x <- DelayedArray(matrix(1L, nrow = 10000000, ncol = 100))
# using matrixStats
matrixStats:::rowSums(x)
# using DelayedMatrixStats
DelayedMatrixStats:::rowSums2(x)
If the first throws an error and the second doesn't, then dmrseq is going to continue to throw the error unless I make the same changes as bsseq or the underlying issue with rowSums
is resolved. In that case I'll change over to using DelayedMatrixStats as soon as possible in dmrseq
As @hpages mentioned, I recommend you resave any bsseq objects that need to be updated due to updates in DelayedMatrix. That way you won't need to rerun those first lines of code each time.
In addition you can install bsseq straight from biocLite()
in devel (3.7) now, since the pertinent changes should have been propagated.
Best, Keegan
library(DelayedArray) Loading required package: stats4 Loading required package: matrixStats Loading required package: BiocGenerics Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colMeans,
colnames, colSums, dirname, do.call, duplicated, eval, evalq,
Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply,
lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int,
pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames,
rowSums, sapply, setdiff, sort, table, tapply, union, unique,
unsplit, which, which.max, which.min
Loading required package: S4Vectors
Attaching package: ‘S4Vectors’
The following object is masked from ‘package:base’:
expand.grid
Loading required package: IRanges Loading required package: BiocParallel
Attaching package: ‘DelayedArray’
The following objects are masked from ‘package:matrixStats’:
colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
The following objects are masked from ‘package:base’:
aperm, apply
x <- DelayedArray(matrix(1L, nrow = 10000000, ncol = 100))
using matrixStats
foo <- matrixStats:::rowSums(x) Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : object 'rowSums' not found
using DelayedMatrixStats
bar <- DelayedMatrixStats:::rowSums2(x)
no error
--t
On Fri, Apr 27, 2018 at 1:33 PM, Keegan Korthauer notifications@github.com wrote:
Hi @ttriche https://github.com/ttriche,
Can you let me know what happens when you run the following?
library(DelayedArray) x <- DelayedArray(matrix(1L, nrow = 10000000, ncol = 100))
using matrixStats
matrixStats:::rowSums(x)
using DelayedMatrixStats
DelayedMatrixStats:::rowSums2(x)
If the first throws an error and the second doesn't, then dmrseq is going to continue to throw the error unless I make the same changes as bsseq or the underlying issue with rowSums is resolved. In that case I'll change over to using DelayedMatrixStats as soon as possible in dmrseq
As @hpages https://github.com/hpages mentioned, I recommend you resave any bsseq objects that need to be updated due to package updates.
In addition you can install bsseq straight from biocLite() in devel (3.7) since the pertinent changes have been propagated.
Best, Keegan
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/DelayedArray/issues/16#issuecomment-385040648, or mute the thread https://github.com/notifications/unsubscribe-auth/AAARIg4dSv8gNiyDu0zwVMF8Nj97UU99ks5ts1ZagaJpZM4TR-hQ .
also:
foo <- matrixStats:::rowSums2(x) Error in matrixStats:::rowSums2(x) : Argument 'x' must be a matrix or a vector.
--t
On Fri, Apr 27, 2018 at 1:46 PM, Tim Triche, Jr. tim.triche@gmail.com wrote:
library(DelayedArray) Loading required package: stats4 Loading required package: matrixStats Loading required package: BiocGenerics Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colMeans, colnames, colSums, dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which, which.max, which.min
Loading required package: S4Vectors
Attaching package: ‘S4Vectors’
The following object is masked from ‘package:base’:
expand.grid
Loading required package: IRanges Loading required package: BiocParallel
Attaching package: ‘DelayedArray’
The following objects are masked from ‘package:matrixStats’:
colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
The following objects are masked from ‘package:base’:
aperm, apply
x <- DelayedArray(matrix(1L, nrow = 10000000, ncol = 100))
using matrixStats
foo <- matrixStats:::rowSums(x) Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : object 'rowSums' not found
using DelayedMatrixStats
bar <- DelayedMatrixStats:::rowSums2(x)
no error
--t
On Fri, Apr 27, 2018 at 1:33 PM, Keegan Korthauer < notifications@github.com> wrote:
Hi @ttriche https://github.com/ttriche,
Can you let me know what happens when you run the following?
library(DelayedArray) x <- DelayedArray(matrix(1L, nrow = 10000000, ncol = 100))
using matrixStats
matrixStats:::rowSums(x)
using DelayedMatrixStats
DelayedMatrixStats:::rowSums2(x)
If the first throws an error and the second doesn't, then dmrseq is going to continue to throw the error unless I make the same changes as bsseq or the underlying issue with rowSums is resolved. In that case I'll change over to using DelayedMatrixStats as soon as possible in dmrseq
As @hpages https://github.com/hpages mentioned, I recommend you resave any bsseq objects that need to be updated due to package updates.
In addition you can install bsseq straight from biocLite() in devel (3.7) since the pertinent changes have been propagated.
Best, Keegan
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/DelayedArray/issues/16#issuecomment-385040648, or mute the thread https://github.com/notifications/unsubscribe-auth/AAARIg4dSv8gNiyDu0zwVMF8Nj97UU99ks5ts1ZagaJpZM4TR-hQ .
@ttriche @kdkorthauer The example needs to be:
library(DelayedArray)
library(DelayedMatrixStats)
x <- DelayedArray(matrix(1L, nrow = 10000000, ncol = 100))
# Using DelayedArray
DelayedArray::rowSums(x)
# Using DelayedMatrixStats
DelayedMatrixStats::rowSums2(x)
matrixStats only works with ordinary matrices
FWIW on this example, DelayedMatrixStats::rowSums2(x)
takes ~6 seconds on my machine.
that one about crashed my machine in DelayedArray::rowSums(x)
--t
On Fri, Apr 27, 2018 at 1:59 PM, Peter Hickey notifications@github.com wrote:
@ttriche https://github.com/ttriche @kdkorthauer https://github.com/kdkorthauer The example needs to be:
library(DelayedArray)x <- DelayedArray(matrix(1L, nrow = 10000000, ncol = 100))# Using DelayedArrayDelayedArray::rowSums(x)# Using DelayedMatrixStatsDelayedMatrixStats::rowSums2(x)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/DelayedArray/issues/16#issuecomment-385047955, or mute the thread https://github.com/notifications/unsubscribe-auth/AAARIm3n9MSqVgWPrC26_63-8U4AG54Iks5ts1yfgaJpZM4TR-hQ .
started with R --vanilla:
x <- DelayedArray(matrix(1L, nrow = 10000000, ncol = 100))
Using DelayedArray
DelayedArray::rowSums(x)
wait about 20 minutes
^C ^C
^D # to quit Error: failed to stop 'SOCKcluster' cluster: invalid connection Error while shutting down parallel: unable to terminate some child processes
So yeah I think I see where the problem is...
--t
On Fri, Apr 27, 2018 at 1:59 PM, Peter Hickey notifications@github.com wrote:
@ttriche https://github.com/ttriche @kdkorthauer https://github.com/kdkorthauer The example needs to be:
library(DelayedArray)x <- DelayedArray(matrix(1L, nrow = 10000000, ncol = 100))# Using DelayedArrayDelayedArray::rowSums(x)# Using DelayedMatrixStatsDelayedMatrixStats::rowSums2(x)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/DelayedArray/issues/16#issuecomment-385047955, or mute the thread https://github.com/notifications/unsubscribe-auth/AAARIm3n9MSqVgWPrC26_63-8U4AG54Iks5ts1yfgaJpZM4TR-hQ .
I hope we can close this.
There have been many important changes and improvements to the block processing mechanism in DelayedArray over the past 18 months (and more are to come). With the latest version of DelayedArray (0.11.8), Pete's original code works on my Linux laptop (Ubuntu 16.04, with 16 Gb of RAM) and is fast. Only thing is that now it displays some strange error messages that seem to be stemming from BiocParallel:
library(DelayedArray)
x <- DelayedArray(matrix(1L, nrow=1e7, ncol=100))
rs1 <- DelayedArray::rowSums(x)
# Error in mcexit(0L) : ignoring SIGPIPE signal
# Error in mcexit(0L) : ignoring SIGPIPE signal
# Error in mcexit(0L) : ignoring SIGPIPE signal
# Error in mcexit(0L) : ignoring SIGPIPE signal
# Error in mcexit(0L) : ignoring SIGPIPE signal
# Error in mcexit(0L) : ignoring SIGPIPE signal
Not sure what's going on exactly but they seem harmless. Besides I only seem to get them on my laptop and I get them with things as simple as:
res <- bplapply(1:25000, identity)
# Error in mcexit(0L) : ignoring SIGPIPE signal
# Error in mcexit(0L) : ignoring SIGPIPE signal
which suggests that they don't have anything to do with DelayedArray. I'll file an issue under BiocParallel about this.
Anyway, unless someone still runs into issues with DelayedArray::rowSums()
, I'll close this in the next few days.
Cheers, H.
> sessionInfo()
R version 3.6.0 Patched (2019-05-02 r76454)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS
Matrix products: default
BLAS: /home/hpages/R/R-3.6.r76454/lib/libRblas.so
LAPACK: /home/hpages/R/R-3.6.r76454/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] DelayedArray_0.11.8 BiocParallel_1.19.4 IRanges_2.19.17
[4] S4Vectors_0.23.25 BiocGenerics_0.31.6 matrixStats_0.55.0
loaded via a namespace (and not attached):
[1] compiler_3.6.0 Matrix_1.2-17 grid_3.6.0
[4] DelayedMatrixStats_1.7.2 lattice_0.20-38
Calling
rowSums()
on a large-ish DelayedMatrix leads to a serialization/forking/memory issue on macOS and LinuxOn macOS (16GB RAM) the error is:
On Linux (20GB RAM) the error is:
I thought it might be a more general issue with
blockApply()
and its use of BiocParallel, but I haven't been able to trigger the problem in some brief testing. For example, usingcolSums()
orblockApply()
-ingmax()
over individual columns or rows ofx
worked fine.