PeteHaitch / DelayedMatrixStats

A port of the matrixStats API to work with DelayedMatrix objects from the DelayedArray package
Other
15 stars 7 forks source link

One test fails on ppc32: Error: cannot allocate vector of size 305.2 Mb #90

Open barracuda156 opened 1 year ago

barracuda156 commented 1 year ago

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: powerpc-apple-darwin10.8.0 (32-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(testthat)
> library(DelayedMatrixStats)
Loading required package: MatrixGenerics
Loading required package: matrixStats

Attaching package: 'MatrixGenerics'

The following objects are masked from 'package:matrixStats':

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
    colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
    colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
    colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
    colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
    rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
    rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
    rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
    rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
    rowWeightedSds, rowWeightedVars

Loading required package: DelayedArray
Loading required package: stats4
Loading required package: Matrix
Loading required package: BiocGenerics

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
    as.data.frame, basename, cbind, colnames, dirname, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
    pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
    tapply, union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:Matrix':

    expand, unname

The following objects are masked from 'package:base':

    I, expand.grid, unname

Loading required package: IRanges

Attaching package: 'DelayedArray'

The following objects are masked from 'package:base':

    apply, rowsum, scale, sweep

Attaching package: 'DelayedMatrixStats'

The following objects are masked from 'package:matrixStats':

    colAnyMissings, rowAnyMissings

> 
> test_check("DelayedMatrixStats")
Loading required package: rhdf5

Attaching package: 'HDF5Array'

The following object is masked from 'package:rhdf5':

    h5ls

R(43227,0xa0dfb620) malloc: *** mmap(size=320004096) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(43227,0xa0dfb620) malloc: *** mmap(size=320004096) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
[ FAIL 1 | WARN 0 | SKIP 0 | PASS 14731 ]

══ Failed tests ════════════════════════════════════════════════════════════════
── Error ('test_GitHub_issues.R:37:3'): Issue 54 is fixed ──────────────────────
Error: cannot allocate vector of size 305.2 Mb
Backtrace:
     ▆
  1. ├─testthat::expect_equal(rowsum(as.matrix(m3), S), rowsum(m3, S)) at test_GitHub_issues.R:37:2
  2. │ └─testthat::quasi_label(enquo(object), label, arg = "object")
  3. │   └─rlang::eval_bare(expr, quo_get_env(quo))
  4. ├─base::rowsum(as.matrix(m3), S)
  5. ├─base::as.matrix(m3)
  6. └─DelayedArray::as.matrix.Array(m3)
  7.   └─DelayedArray:::.from_Array_to_matrix(x, ...)
  8.     ├─base::as.array(x, drop = TRUE)
  9.     └─DelayedArray::as.array.Array(x, drop = TRUE)
 10.       └─DelayedArray:::.from_Array_to_array(x, ...)
 11.         ├─DelayedArray::extract_array(x, index)
 12.         └─DelayedArray::extract_array(x, index)
 13.           ├─methods::callNextMethod()
 14.           └─DelayedArray (local) .nextMethod(x = x, index = index)
 15.             ├─DelayedArray::extract_array(x@seed, index)
 16.             └─DelayedArray::extract_array(x@seed, index)
 17.               └─base::unlist(res, use.names = FALSE)

[ FAIL 1 | WARN 0 | SKIP 0 | PASS 14731 ]
Error: Test failures
Execution halted
PeteHaitch commented 1 year ago

Thank you for reporting. I don't have access to a ppc32 but this test is skipped on 32-bit Windows because it also caused problems there (https://github.com/PeteHaitch/DelayedMatrixStats/blob/devel/tests/testthat/test_GitHub_issues.R#L34; the test is actually skipped on both 32-bit and 64-bit Windows platforms simply because it was easier to skip the the test in that way when it was written, I might re-visit that).

Could you please try the following code (modified from the test suite and removing the testthat-related stuff) and post the output along with the output of sessionInfo() on your system.

suppressPackageStartupMessages(library(Matrix))
suppressPackageStartupMessages(library(DelayedArray))

# Small normal matrix
m1 <- DelayedArray(as.matrix(iris[, 1:4]))
all.equal(rowsum(as.matrix(m1), iris$Species), rowsum(m1, iris$Species))
#> [1] TRUE

# Large sparse matrix
x <- Matrix::rsparsematrix(800000, ncol = 50, density = 0.1)

# Large normal matrix
m2 <- DelayedArray(as.matrix(x))
S <- sample(1:1000, nrow(m2), replace = TRUE)
all.equal(rowsum(as.matrix(m2), S), rowsum(m2, S))
#> [1] TRUE

# dgCMatrix
m4 <- DelayedArray(x)
S <- sample(1:1000, nrow(m4), replace = TRUE)
all.equal(rowsum(as.matrix(m4), S), rowsum(m4, S))
#> [1] TRUE

# RleMatrix
# NOTE: This test fails on 32-bit Windows because it can't allocate a ~150 Mb
#       vector.
m3 <- as(m2, "RleMatrix")
S <- sample(1:1000, nrow(m3), replace = TRUE)
# Suspect this line will cause error on ppc32.
m4 <- as.matrix(m3)
a <- rowsum(m4, S)
b <- rowsum(m3, S)
all.equal(a, b)
#> [1] TRUE

Created on 2023-03-30 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R Under development (unstable) (2023-02-13 r83829) #> os macOS Ventura 13.2.1 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Australia/Melbourne #> date 2023-03-30 #> pandoc 2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> BiocGenerics * 0.45.2 2023-03-15 [1] Bioconductor #> cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0) #> DelayedArray * 0.25.0 2022-12-20 [1] Bioconductor #> digest 0.6.31 2022-12-11 [1] CRAN (R 4.3.0) #> evaluate 0.20 2023-01-17 [1] CRAN (R 4.3.0) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> fs 1.6.1 2023-02-06 [1] CRAN (R 4.3.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0) #> htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.3.0) #> IRanges * 2.33.0 2022-12-20 [1] Bioconductor #> knitr 1.42 2023-01-25 [1] CRAN (R 4.3.0) #> lattice 0.20-45 2021-09-22 [1] CRAN (R 4.3.0) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0) #> Matrix * 1.5-3 2022-11-11 [1] CRAN (R 4.3.0) #> MatrixGenerics * 1.11.0 2022-12-20 [1] Bioconductor #> matrixStats * 0.63.0 2022-11-18 [1] CRAN (R 4.3.0) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.0) #> rlang 1.1.0 2023-03-14 [1] CRAN (R 4.3.0) #> rmarkdown 2.21 2023-03-26 [1] CRAN (R 4.3.0) #> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.3.0) #> S4Vectors * 0.37.4 2023-02-26 [1] Bioconductor #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.0) #> xfun 0.38 2023-03-24 [1] CRAN (R 4.3.0) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0) #> #> [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
barracuda156 commented 1 year ago

@PeteHaitch Indeed, it fails at that spot:

10:~ svacchanda$ r

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: powerpc-apple-darwin10.8.0 (32-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> suppressPackageStartupMessages(library(Matrix))
> suppressPackageStartupMessages(library(DelayedArray))
> # Small normal matrix
> m1 <- DelayedArray(as.matrix(iris[, 1:4]))
> all.equal(rowsum(as.matrix(m1), iris$Species), rowsum(m1, iris$Species))
[1] TRUE
> #> [1] TRUE
> # Large sparse matrix
> x <- Matrix::rsparsematrix(800000, ncol = 50, density = 0.1)
> # Large normal matrix
> m2 <- DelayedArray(as.matrix(x))
> S <- sample(1:1000, nrow(m2), replace = TRUE)
> all.equal(rowsum(as.matrix(m2), S), rowsum(m2, S))
[1] TRUE
> #> [1] TRUE
> # dgCMatrix
> m4 <- DelayedArray(x)
> S <- sample(1:1000, nrow(m4), replace = TRUE)
> all.equal(rowsum(as.matrix(m4), S), rowsum(m4, S))
[1] TRUE
> #> [1] TRUE
> # RleMatrix
> # NOTE: This test fails on 32-bit Windows because it can't allocate a ~150 Mb
> #       vector.
> m3 <- as(m2, "RleMatrix")
S <- sample(1:1000, nrow(m3), replace = TRUE)
> S <- sample(1:1000, nrow(m3), replace = TRUE)
> # Suspect this line will cause error on ppc32.
> m4 <- as.matrix(m3)
a <- rowsum(m4, S)
b <- rowsum(m3, S)
all.equal(a, b)
#> [1] TRUE
R(99267) malloc: *** mmap(size=536875008) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(99267) malloc: *** mmap(size=536875008) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Error: cannot allocate vector of size 512.0 Mb
> a <- rowsum(m4, S)
> b <- rowsum(m3, S)
> all.equal(a, b)
[1] TRUE
> #> [1] TRUE
> sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: powerpc-apple-darwin10.8.0 (32-bit)
Running under: OS X Snow Leopard 10.6.8

Matrix products: default
BLAS:   /opt/local/Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.dylib
LAPACK: /opt/local/Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] DelayedArray_0.24.0   IRanges_2.32.0        S4Vectors_0.36.2     
[4] MatrixGenerics_1.10.0 matrixStats_0.63.0    BiocGenerics_0.44.0  
[7] Matrix_1.5-3         

loaded via a namespace (and not attached):
[1] compiler_4.2.3  grid_4.2.3      lattice_0.20-45
> 
PeteHaitch commented 1 year ago

Thanks for confirming. I'm seeking clarification on Bioconductor's commitment to 32-bit OSs/systems (on the Bioconductor Slack if you are a member: https://community-bioc.slack.com/archives/CLUJWDQF4/p1680126767525049; my query is copied below).

What's BioC's commitment to 32-bit OSs/systems? The 3.16 release notes (https://bioconductor.org/news/bioc_3_16_release/) say "Bioconductor 3.16 is compatible with R 4.2, and is supported on Linux, 64-bit Windows, and Intel 64-bit macOS 10.13 (High Sierra) or higher" It seems clear what that means for Windows and macOS but is that an implicit commitment to 32-bit Linux systems?

I'm about to go on leave until the end of April and don't have resources to investigate this further until I'm back.

barracuda156 commented 1 year ago

@PeteHaitch Thank you. We may not find much of an active support from Bioconductor, I am afraid, but if they support Windows 32-bit, there are no reasons not to support other 32-bit systems, be it macOS, *BSD or Linux. We do support ppc and i386 in Macports.

P. S. Just for a perspective: as a matter of fact, pretty much everything builds and passes tests on macOS ppc32: out of about 1400 R packages that I brought into Macports so far, I have 3–4 which remain in fixme state, and maybe about a dozen where some tests fail. Most of packages need no fixes and work fine as-is.

kasperdanielhansen commented 1 year ago

Could this be caused by endianness? I believe ppc32 is big endian while x86 is little endian.

barracuda156 commented 1 year ago

Could this be caused by endianness? I believe ppc32 is big endian while x86 is little endian.

@kasperdanielhansen Yes, ppc32 is always Big-endian (ppc64 can be either).

P. S. Is this is endianness-related rather than bitness-, perhaps a stronger case to have this fixed.

PeteHaitch commented 1 year ago

Based on the Error: cannot allocate vector of size 305.2 Mb or Error: cannot allocate vector of size 512.0 Mb and the previous failure on 32-bit Windows being the same thing, I think it's bitness-related.

Also, for what's it's worth, the example in https://github.com/PeteHaitch/DelayedMatrixStats/issues/90#issuecomment-1489364360 doesn't actually involve DelayedMatrixStats (but does involve DelayedArray) but the initial report is because DelayedMatrixStats is tickling the error in its unit tests.

barracuda156 commented 1 year ago

@PeteHaitch I agree, this looks rather like 32-bit-related error.