Bioconductor / DelayedArray

A unified framework for working transparently with on-disk and in-memory array-like datasets
https://bioconductor.org/packages/DelayedArray
25 stars 9 forks source link

Bug in rowSums,DelayedMatrix-method #16

Closed PeteHaitch closed 5 years ago

PeteHaitch commented 6 years ago

Calling rowSums() on a large-ish DelayedMatrix leads to a serialization/forking/memory issue on macOS and Linux

library(DelayedArray)
x <- DelayedArray(matrix(1L, nrow = 10000000, ncol = 100))

# Errors
rowSums(x)

On macOS (16GB RAM) the error is:

Error: vector memory exhausted (limit reached?)

On Linux (20GB RAM) the error is:

Error in serialize(data, node$con, xdr = FALSE) : ignoring SIGPIPE signal
Error in serialize(data, node$con, xdr = FALSE) : ignoring SIGPIPE signal
Error: failed to stop ‘SOCKcluster’ cluster: error writing to connection
Error in serialize(data, node$con, xdr = FALSE) : ignoring SIGPIPE signal
# Then "Error in serialize(data, node$con, xdr = FALSE) : ignoring SIGPIPE signal" repeats indefinitely

I thought it might be a more general issue with blockApply() and its use of BiocParallel, but I haven't been able to trigger the problem in some brief testing. For example, using colSums() or blockApply()-ing max() over individual columns or rows of x worked fine.

ttriche commented 6 years ago

nb. I see this problem even on Linux boxes with 384GB of RAM.

It does not appear to be a small-memory issue per se.

PeteHaitch commented 6 years ago

@ttriche can you please try DelayedMatrixStats::rowSums2() and let me know if you still have the issue? It's implemented slightly differently

ttriche commented 6 years ago

I use rowSums2() in my code, but the trigger for this behavior seems to be inside of the dmrseq package, and I haven't been able to track it down yet. Running some separate tests to try and figure it out now.

--t

On Tue, Apr 24, 2018 at 1:31 PM, Peter Hickey notifications@github.com wrote:

@ttriche https://github.com/ttriche can you please try DelayedMatrixStats::rowSums2() and let me know if you still have the issue? It's implemented slightly differently

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/DelayedArray/issues/16#issuecomment-384016030, or mute the thread https://github.com/notifications/unsubscribe-auth/AAARIu1AEpx1kOPsQnTpbFp-T4kA7gtfks5tr2GDgaJpZM4TR-hQ .

kdkorthauer commented 6 years ago

@PeteHaitch Thanks for looking into this. Should the problem be fixed in hansenlab/bsseq with the most recent commit?

@ttriche If you try installing the latest bsseq from hansenlab/bsseq, does dmrseq run properly again?

PeteHaitch commented 6 years ago

@kdkorthauer Yes, it should be fixed now in bsseq. However, it's not clear to me if @ttriche's dmrseq example is triggering this issue from within bsseq or elsewhere in dmrseq. I'm happy to help debug this further

kdkorthauer commented 6 years ago

@PeteHaitch awesome, thanks! I'll investigate whether the issue persists within dmrseq, and if so whether a similar patch might work.

mtmorgan commented 6 years ago

DelayedArray does this

    bplapply(seq_len(nblock),
        function(b) {
            if (get_verbose_block_processing())
                message("Processing block ", b, "/", nblock, " ... ",
                        appendLF=FALSE)
            viewport <- grid[[b]]
            block <- extract_block(x, viewport)
            if (!is.array(block))
                block <- .as_array_or_matrix(block)
            attr(block, "from_grid") <- grid
            attr(block, "block_id") <- b
            block_ans <- FUN(block, ...)
            if (get_verbose_block_processing())
                message("OK")
            block_ans
        },
        BPREDO=BPREDO,
        BPPARAM=BPPARAM
    )

where block <- extract_block(x, viewport) is done on the worker. This means that x needs to be made available (serialized to, even in the case of MulticoreParam()) on the worker. A different implementation is to use bpiterate() and a generator function ITER to produce blocks on the manager.

    ITER <- local({
        b <- 0L
        function() {
            b <<- b + 1L
            if (b > nblock)
                return(NULL)
            if (get_verbose_block_processing())
                message("Processing block ", b, "/", nblock, " ... ",
                        appendLF=FALSE)
            viewport <- grid[[b]]
            block <- extract_block(x, viewport)
            if (!is.array(block))
                block <- .as_array_or_matrix(block)
            attr(block, "from_grid") <- grid
            attr(block, "block_id") <- b
            block
        }
    })

    bpiterate(ITER, FUN, BPPARAM = BPPARAM)

This 'works' but is incredibly slow (use set_verbose_block_processing(TRUE) to convince yourself that it's chugging away) because the chunks are still serialized to each worker, and because the garbage collector is being called often; I'll explore a better solution for the common multicore data transfer problem in BiocParallel. Also I'm not sure where, if x were something like an HDF5Array, the object is actually realized in memory as a matrix; one would like to do that step on the worker.

I say 'works', but actually on my laptop (after setting BiocParallel::register(BiocParallel::SerialParam()) for better speed) in the rowSums,DelayedArray method there is

    block_results <- blockApply(x, rowSums, na.rm=na.rm)

    ans <- rowSums(matrix(unlist(block_results, use.names=FALSE), nrow=nrow(x)))

and on the last line I end up with Error: cannot allocate vector of size 7.5 Gb -- the original object x is consuming 3.7G, block_results consumes 7.5G (these are doubles, rather than ints, which is a little surprising, I would have thought rowSums() would have returned ints if possible @lawremi), unlist(block_results, use.names=FALSE) is another 7.5G, and the matrix(...) is another 7.5G so 3.7G + 3 * 7.5G. At this point the unlist() result is available for garbage collection (is it?), but then ans takes its place using another 7.5G! If this has been written as

    result <- blockApply(x, rowSums, na.rm=na.rm)
    result <- unlist(result, use.names = FALSE)
    result <- matrix(result, nrow = nrow(x))
    result <- rowSums(result)

there would only ever need to be 3.7G + 2 x 7.5G in memory.

Hmm, but now I'm confused! unlist(blockApply()) is as.vector(x) (!), which we reshape into a matrix equal to x (except with numeric rather than integer type) and then we calculate rowSums locally! I guess DelayedArray has chosen to block into chunks where each chunk has a single column...

hpages commented 6 years ago

unlist(blockApply()) is as.vector(x) because blocApply() is still using the old default block grid where the blocks go "along the columns". This is not optimal in most cases and needs to change. Will do ASAP.

ttriche commented 6 years ago

as of today, with R-3.5, bioc-devel, hansenlab/bsseq, Bioconductor/DelayedArray, and the rest, I'm still seeing the following on a 384GB machine with 24 cores:

# first, I need to update old objects:
R> byChr <- function(x) split(x, seqnames(x))
R> byChr(bsseq)[todo]
Error in .check_DelayedArray_internals(x) :
  DelayedMatrix object uses internal representation from DelayedArray
  < 0.5.11 and cannot be displayed or used. Please update it with:

      object <- updateObject(object, verbose=TRUE)

  and re-serialize it.
R> todo
 [1] "chr22" "chr21" "chr20" "chr19" "chr18" "chr17" "chr16" "chr15" "chr14"
[10] "chr13" "chr12" "chr10" "chr9"  "chr8"  "chr6"  "chr5"  "chr3"  "chr2"
[19] "chr1"
R> bsseq <- updateObject(bsseq, verbose=TRUE)
updateObject(object="ANY") default for object of class 'matrix'
[updateObject] DelayedMatrix object uses internal representation from
[updateObject] DelayedArray < 0.5.11. Updating it ...
updateObject(object="ANY") default for object of class 'matrix'
[updateObject] DelayedMatrix object uses internal representation from
[updateObject] DelayedArray < 0.5.11. Updating it ...
[updateObject] GRanges object uses internal representation from
[updateObject] GenomicRanges < 1.31.16. Updating it ...
[updateObject] elementType slot of IRanges object should be set to "ANY",
[updateObject] not "integer". Updating it ...
R> byChr(bsseq)[todo]
List of length 19
names(19): chr22 chr21 chr20 chr19 chr18 chr17 ... chr6 chr5 chr3 chr2 chr1
# Then, with the freshly installed packages:
R> DMRs <- lapply(byChr(bsseq)[todo], WGBSeq, testCovariate="tumor")
3135 loci with 0 coverage in at least 1 condition.
Retaining 561544 loci.
Assuming the test covariate tumor is a factor.
Condition: 1 vs 0
Error in serialize(data, node$con, xdr = FALSE) :
  error writing to connection
Error: failed to stop 'SOCKcluster' cluster: error writing to connection

So that's a little frustrating, given that it is going from chr22 (smallest) as the first chunk.

R> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /usr/lib64/R/lib/libRblas.so
LAPACK: /usr/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices datasets  utils
[8] methods   base

other attached packages:
 [1] biscuitEater_0.9.11         bsseq_1.15.5
 [3] SummarizedExperiment_1.9.18 DelayedArray_0.5.34
 [5] BiocParallel_1.13.3         matrixStats_0.53.1
 [7] Biobase_2.39.2              GenomicRanges_1.31.23
 [9] GenomeInfoDb_1.15.5         IRanges_2.13.28
[11] S4Vectors_0.17.43           BiocGenerics_0.25.3
[13] BiocInstaller_1.29.6        skeletor_1.0.4
[15] magrittr_1.5                gtools_3.5.0
[17] useful_1.2.3                ggplot2_2.2.1
[19] purrr_0.2.4                 knitr_1.20

loaded via a namespace (and not attached):
  [1] colorspace_1.3-2
  [2] XVector_0.19.9
  [3] roxygen2_6.0.1
  [4] bit64_0.9-7
  [5] interactiveDisplayBase_1.17.0
  [6] AnnotationDbi_1.41.5
  [7] qualV_0.3-3
  [8] xml2_1.2.0
  [9] splines_3.5.0
 [10] codetools_0.2-15
 [11] R.methodsS3_1.7.1
 [12] impute_1.53.0
 [13] dmrseq_0.99.13
 [14] Rsamtools_1.31.3
 [15] GO.db_3.6.0
 [16] R.oo_1.22.0
 [17] graph_1.57.1
 [18] shiny_1.0.5
 [19] HDF5Array_1.7.11
 [20] readr_1.1.1
 [21] compiler_3.5.0
 [22] httr_1.3.1
 [23] assertthat_0.2.0
 [24] Matrix_1.2-14
 [25] lazyeval_0.2.1
 [26] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [27] limma_3.35.15
 [28] later_0.7.1
 [29] htmltools_0.3.6
 [30] prettyunits_1.0.2
 [31] tools_3.5.0
 [32] bindrcpp_0.2.2
 [33] gtable_0.2.0
 [34] glue_1.2.0
 [35] GenomeInfoDbData_1.1.0
 [36] annotatr_1.5.10
 [37] reshape2_1.4.3
 [38] dplyr_0.7.4
 [39] doRNG_1.6.6
 [40] Rcpp_0.12.16
 [41] bumphunter_1.21.0
 [42] Biostrings_2.47.12
 [43] nlme_3.1-137
 [44] rtracklayer_1.39.13
 [45] iterators_1.0.9
 [46] DelayedMatrixStats_1.1.12
 [47] stringr_1.3.0
 [48] fastseg_1.25.0
 [49] mime_0.5
 [50] rngtools_1.2.4
 [51] devtools_1.13.5
 [52] XML_3.98-1.11
 [53] org.Hs.eg.db_3.6.0
 [54] AnnotationHub_2.11.4
 [55] zlibbioc_1.25.0
 [56] scales_0.5.0
 [57] BSgenome_1.47.5
 [58] hms_0.4.2
 [59] promises_1.0.1
 [60] RBGL_1.55.1
 [61] rhdf5_2.23.8
 [62] RColorBrewer_1.1-2
 [63] yaml_2.1.18
 [64] memoise_1.1.0
 [65] pkgmaker_0.22
 [66] biomaRt_2.35.13
 [67] stringi_1.1.7
 [68] RSQLite_2.1.0
 [69] foreach_1.4.4
 [70] permute_0.9-4
 [71] GenomicFeatures_1.31.10
 [72] rlang_0.2.0
 [73] pkgconfig_2.0.1
 [74] commonmark_1.4
 [75] bitops_1.0-6
 [76] lattice_0.20-35
 [77] Rhdf5lib_1.1.6
 [78] bindr_0.1.1
 [79] GenomicAlignments_1.15.13
 [80] bit_1.1-12
 [81] plyr_1.8.4
 [82] R6_2.2.2
 [83] DBI_0.8
 [84] pillar_1.2.2
 [85] withr_2.1.2
 [86] RCurl_1.95-4.10
 [87] tibble_1.4.2
 [88] KernSmooth_2.23-15
 [89] OrganismDbi_1.21.1
 [90] HMMcopy_1.21.0
 [91] Homo.sapiens_1.3.1
 [92] progress_1.1.2
 [93] locfit_1.5-9.1
 [94] grid_3.5.0
 [95] data.table_1.10.4-3
 [96] blob_1.1.1
 [97] digest_0.6.15
 [98] xtable_1.8-2
 [99] httpuv_1.4.1
[100] regioneR_1.11.0
[101] outliers_0.14
[102] R.utils_2.6.0
[103] munsell_0.4.3
[104] registry_0.5

Any ideas? It's getting to the point where dmrseq/bsseq is unusable for certain tasks :-(

--t

On Fri, Apr 27, 2018 at 10:47 AM, hpages notifications@github.com wrote:

unlist(blockApply()) is as.vector(x) because blocApply() is still using the old default block grid where the blocks go "along the columns". This is not optimal in most cases and needs to change. Will do ASAP.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/DelayedArray/issues/16#issuecomment-384992830, or mute the thread https://github.com/notifications/unsubscribe-auth/AAARIuVfppHN4aD0aJ0h3Q9bPORLRsrtks5tsy-LgaJpZM4TR-hQ .

ttriche commented 6 years ago

nb. biocValid does not like my installation of newer packages:

* Packages too new for Bioconductor version '3.7'

                     Version  
bsseq                "1.15.5" 
DelayedArray         "0.5.34" 
S4Vectors            "0.17.43"
SummarizedExperiment "1.9.18" 
                     LibPath                                                 
bsseq                "/home/tim.triche/R/x86_64-redhat-linux-gnu-library/3.5"
DelayedArray         "/home/tim.triche/R/x86_64-redhat-linux-gnu-library/3.5"
S4Vectors            "/home/tim.triche/R/x86_64-redhat-linux-gnu-library/3.5"
SummarizedExperiment "/home/tim.triche/R/x86_64-redhat-linux-gnu-library/3.5"

downgrade with biocLite(c("bsseq", "DelayedArray", "S4Vectors", "SummarizedExperiment"))

Error: 4 package(s) too new

However, I can't reinstall bsseq from the hansenlab repo without doing this. So... I'm stumped.

Last time I used SerialParam() and mclapply() to get around this, which seems utterly disgusting and wrong. But it did have the benefit of working for some of the chromosomes (3 out of 22, and oddly they were not the small ones -- 4, 7, and 11 succeeded). I suppose I'll try that again...

hpages commented 6 years ago

@ttriche How do I get that bsseq object? Could it be updated and re-serialized once for all so we remove that part from the equation? Thx!

kdkorthauer commented 6 years ago

Hi @ttriche,

Can you let me know what happens when you run the following?

library(DelayedArray)
x <- DelayedArray(matrix(1L, nrow = 10000000, ncol = 100))

# using matrixStats
matrixStats:::rowSums(x)

# using DelayedMatrixStats
DelayedMatrixStats:::rowSums2(x)

If the first throws an error and the second doesn't, then dmrseq is going to continue to throw the error unless I make the same changes as bsseq or the underlying issue with rowSums is resolved. In that case I'll change over to using DelayedMatrixStats as soon as possible in dmrseq

As @hpages mentioned, I recommend you resave any bsseq objects that need to be updated due to updates in DelayedMatrix. That way you won't need to rerun those first lines of code each time.

In addition you can install bsseq straight from biocLite() in devel (3.7) now, since the pertinent changes should have been propagated.

Best, Keegan

ttriche commented 6 years ago

library(DelayedArray) Loading required package: stats4 Loading required package: matrixStats Loading required package: BiocGenerics Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

anyDuplicated, append, as.data.frame, basename, cbind, colMeans,
colnames, colSums, dirname, do.call, duplicated, eval, evalq,
Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply,
lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int,
pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames,
rowSums, sapply, setdiff, sort, table, tapply, union, unique,
unsplit, which, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

expand.grid

Loading required package: IRanges Loading required package: BiocParallel

Attaching package: ‘DelayedArray’

The following objects are masked from ‘package:matrixStats’:

colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges

The following objects are masked from ‘package:base’:

aperm, apply

x <- DelayedArray(matrix(1L, nrow = 10000000, ncol = 100))

using matrixStats

foo <- matrixStats:::rowSums(x) Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : object 'rowSums' not found

using DelayedMatrixStats

bar <- DelayedMatrixStats:::rowSums2(x)

no error

--t

On Fri, Apr 27, 2018 at 1:33 PM, Keegan Korthauer notifications@github.com wrote:

Hi @ttriche https://github.com/ttriche,

Can you let me know what happens when you run the following?

library(DelayedArray) x <- DelayedArray(matrix(1L, nrow = 10000000, ncol = 100))

using matrixStats

matrixStats:::rowSums(x)

using DelayedMatrixStats

DelayedMatrixStats:::rowSums2(x)

If the first throws an error and the second doesn't, then dmrseq is going to continue to throw the error unless I make the same changes as bsseq or the underlying issue with rowSums is resolved. In that case I'll change over to using DelayedMatrixStats as soon as possible in dmrseq

As @hpages https://github.com/hpages mentioned, I recommend you resave any bsseq objects that need to be updated due to package updates.

In addition you can install bsseq straight from biocLite() in devel (3.7) since the pertinent changes have been propagated.

Best, Keegan

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/DelayedArray/issues/16#issuecomment-385040648, or mute the thread https://github.com/notifications/unsubscribe-auth/AAARIg4dSv8gNiyDu0zwVMF8Nj97UU99ks5ts1ZagaJpZM4TR-hQ .

ttriche commented 6 years ago

also:

foo <- matrixStats:::rowSums2(x) Error in matrixStats:::rowSums2(x) : Argument 'x' must be a matrix or a vector.

--t

On Fri, Apr 27, 2018 at 1:46 PM, Tim Triche, Jr. tim.triche@gmail.com wrote:

library(DelayedArray) Loading required package: stats4 Loading required package: matrixStats Loading required package: BiocGenerics Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

anyDuplicated, append, as.data.frame, basename, cbind, colMeans,
colnames, colSums, dirname, do.call, duplicated, eval, evalq,
Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply,
lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int,
pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames,
rowSums, sapply, setdiff, sort, table, tapply, union, unique,
unsplit, which, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

expand.grid

Loading required package: IRanges Loading required package: BiocParallel

Attaching package: ‘DelayedArray’

The following objects are masked from ‘package:matrixStats’:

colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges

The following objects are masked from ‘package:base’:

aperm, apply

x <- DelayedArray(matrix(1L, nrow = 10000000, ncol = 100))

using matrixStats

foo <- matrixStats:::rowSums(x) Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : object 'rowSums' not found

using DelayedMatrixStats

bar <- DelayedMatrixStats:::rowSums2(x)

no error

--t

On Fri, Apr 27, 2018 at 1:33 PM, Keegan Korthauer < notifications@github.com> wrote:

Hi @ttriche https://github.com/ttriche,

Can you let me know what happens when you run the following?

library(DelayedArray) x <- DelayedArray(matrix(1L, nrow = 10000000, ncol = 100))

using matrixStats

matrixStats:::rowSums(x)

using DelayedMatrixStats

DelayedMatrixStats:::rowSums2(x)

If the first throws an error and the second doesn't, then dmrseq is going to continue to throw the error unless I make the same changes as bsseq or the underlying issue with rowSums is resolved. In that case I'll change over to using DelayedMatrixStats as soon as possible in dmrseq

As @hpages https://github.com/hpages mentioned, I recommend you resave any bsseq objects that need to be updated due to package updates.

In addition you can install bsseq straight from biocLite() in devel (3.7) since the pertinent changes have been propagated.

Best, Keegan

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/DelayedArray/issues/16#issuecomment-385040648, or mute the thread https://github.com/notifications/unsubscribe-auth/AAARIg4dSv8gNiyDu0zwVMF8Nj97UU99ks5ts1ZagaJpZM4TR-hQ .

PeteHaitch commented 6 years ago

@ttriche @kdkorthauer The example needs to be:

library(DelayedArray)
library(DelayedMatrixStats)
x <- DelayedArray(matrix(1L, nrow = 10000000, ncol = 100))
# Using DelayedArray
DelayedArray::rowSums(x)
# Using DelayedMatrixStats
DelayedMatrixStats::rowSums2(x)

matrixStats only works with ordinary matrices

FWIW on this example, DelayedMatrixStats::rowSums2(x) takes ~6 seconds on my machine.

ttriche commented 6 years ago

that one about crashed my machine in DelayedArray::rowSums(x)

--t

On Fri, Apr 27, 2018 at 1:59 PM, Peter Hickey notifications@github.com wrote:

@ttriche https://github.com/ttriche @kdkorthauer https://github.com/kdkorthauer The example needs to be:

library(DelayedArray)x <- DelayedArray(matrix(1L, nrow = 10000000, ncol = 100))# Using DelayedArrayDelayedArray::rowSums(x)# Using DelayedMatrixStatsDelayedMatrixStats::rowSums2(x)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/DelayedArray/issues/16#issuecomment-385047955, or mute the thread https://github.com/notifications/unsubscribe-auth/AAARIm3n9MSqVgWPrC26_63-8U4AG54Iks5ts1yfgaJpZM4TR-hQ .

ttriche commented 6 years ago

started with R --vanilla:

x <- DelayedArray(matrix(1L, nrow = 10000000, ncol = 100))

Using DelayedArray

DelayedArray::rowSums(x)

wait about 20 minutes

^C ^C

^D # to quit Error: failed to stop 'SOCKcluster' cluster: invalid connection Error while shutting down parallel: unable to terminate some child processes

So yeah I think I see where the problem is...

--t

On Fri, Apr 27, 2018 at 1:59 PM, Peter Hickey notifications@github.com wrote:

@ttriche https://github.com/ttriche @kdkorthauer https://github.com/kdkorthauer The example needs to be:

library(DelayedArray)x <- DelayedArray(matrix(1L, nrow = 10000000, ncol = 100))# Using DelayedArrayDelayedArray::rowSums(x)# Using DelayedMatrixStatsDelayedMatrixStats::rowSums2(x)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/DelayedArray/issues/16#issuecomment-385047955, or mute the thread https://github.com/notifications/unsubscribe-auth/AAARIm3n9MSqVgWPrC26_63-8U4AG54Iks5ts1yfgaJpZM4TR-hQ .

hpages commented 5 years ago

I hope we can close this.

There have been many important changes and improvements to the block processing mechanism in DelayedArray over the past 18 months (and more are to come). With the latest version of DelayedArray (0.11.8), Pete's original code works on my Linux laptop (Ubuntu 16.04, with 16 Gb of RAM) and is fast. Only thing is that now it displays some strange error messages that seem to be stemming from BiocParallel:

library(DelayedArray)
x <- DelayedArray(matrix(1L, nrow=1e7, ncol=100))
rs1 <- DelayedArray::rowSums(x)
# Error in mcexit(0L) : ignoring SIGPIPE signal
# Error in mcexit(0L) : ignoring SIGPIPE signal
# Error in mcexit(0L) : ignoring SIGPIPE signal
# Error in mcexit(0L) : ignoring SIGPIPE signal
# Error in mcexit(0L) : ignoring SIGPIPE signal
# Error in mcexit(0L) : ignoring SIGPIPE signal

Not sure what's going on exactly but they seem harmless. Besides I only seem to get them on my laptop and I get them with things as simple as:

res <- bplapply(1:25000, identity)
# Error in mcexit(0L) : ignoring SIGPIPE signal
# Error in mcexit(0L) : ignoring SIGPIPE signal

which suggests that they don't have anything to do with DelayedArray. I'll file an issue under BiocParallel about this.

Anyway, unless someone still runs into issues with DelayedArray::rowSums(), I'll close this in the next few days.

Cheers, H.

> sessionInfo()
R version 3.6.0 Patched (2019-05-02 r76454)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS:   /home/hpages/R/R-3.6.r76454/lib/libRblas.so
LAPACK: /home/hpages/R/R-3.6.r76454/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] DelayedArray_0.11.8 BiocParallel_1.19.4 IRanges_2.19.17    
[4] S4Vectors_0.23.25   BiocGenerics_0.31.6 matrixStats_0.55.0 

loaded via a namespace (and not attached):
[1] compiler_3.6.0           Matrix_1.2-17            grid_3.6.0              
[4] DelayedMatrixStats_1.7.2 lattice_0.20-38