PeteHaitch / DelayedMatrixStats

A port of the matrixStats API to work with DelayedMatrix objects from the DelayedArray package
Other
15 stars 7 forks source link

new rowsum() on RleMatrix and dgCMatrix produces errors? #54

Closed MalteThodberg closed 5 years ago

MalteThodberg commented 6 years ago

I'm getting some strange error messages when trying the new rowsum() method on large RleMatrix-objects or dgCMatrix-objects. I think it's related the the block processing, here's how to reproduce the errors:

library(DelayedArray)
library(DelayedMatrixStats)

# Small normal matrix
m1 <- DelayedArray(as.matrix(iris[,1:4]))
rowsum(m1, iris$Species, force_block_processing = FALSE)
rowsum(m1, iris$Species, force_block_processing = TRUE)

# Large sparse matrix
x <- rsparsematrix(800000, ncol=50, density=0.1)

# Large normal matrix
m2 <- DelayedArray(as.matrix(x))
S <- sample(1:1000, nrow(m2), replace=TRUE)
rowsum(m2, S, force_block_processing = FALSE) 
rowsum(m2, S, force_block_processing = TRUE) # Error

# RleMatrix
m3 <- as(m2, "RleMatrix")
S <- sample(1:1000, nrow(m3), replace=TRUE)
rowsum(m3, S, force_block_processing = FALSE) # Error
rowsum(m3, S, force_block_processing = TRUE) # Error

# dgCMatrix
m4 <- DelayedArray(x)
S <- sample(1:1000, nrow(m4), replace=TRUE)
rowsum(m4, S, force_block_processing = FALSE) # Error
rowsum(m4, S, force_block_processing = TRUE) # Error
PeteHaitch commented 6 years ago

Thanks, Malte. I can confirm the error and suspect I know the source. I'll take a look this week.

PeteHaitch commented 6 years ago

Thanks again for the clear bug report, @MalteThodberg. This is fixed in v1.3.9 (available in the devel branch of Bioconductor in a few days) and I've added your example as a unit test.

PeteHaitch commented 6 years ago

Hmm, there looks to be an issue with the fix on Windows (https://www.bioconductor.org/checkResults/3.8/bioc-LATEST/DelayedMatrixStats/tokay1-checksrc.html) I don't have ready access to a Windows machine to test this on. By chance are you on Windows, @MalteThodberg, and able to test?

MalteThodberg commented 6 years ago

Unfortunately not, I'm on macOS/linux.

The current GitHub version works for me now. I'm doing some quick benchmarks to compare the speed of DelayedMatrixStats::rowsum to Matrix.utils::aggregate.Matrix.