Bioconductor / DelayedArray

A unified framework for working transparently with on-disk and in-memory array-like datasets
https://bioconductor.org/packages/DelayedArray
24 stars 9 forks source link

Division by a vector does not preserve sparsity #73

Closed LTLA closed 3 years ago

LTLA commented 3 years ago

Correct behavior should mimic a raw sparse matrix:

library(DelayedArray)
library(Matrix)
mat <- rsparsematrix(1000, 1000, 0.01)
div <- runif(nrow(out))
is_sparse(mat/div)
## [1] TRUE

out <- DelayedArray(mat)
is_sparse(out)
## [1] TRUE

out2 <- out / div
is_sparse(out2)
## [1] FALSE

No problem for scalars, though.

Session information ``` R version 4.0.0 Patched (2020-04-28 r78328) Platform: x86_64-apple-darwin17.7.0 (64-bit) Running under: macOS High Sierra 10.13.6 Matrix products: default BLAS: /Users/luna/Software/R/R-4-0-branch-dev/lib/libRblas.dylib LAPACK: /Users/luna/Software/R/R-4-0-branch-dev/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] DelayedArray_0.15.16 IRanges_2.23.10 S4Vectors_0.27.14 [4] MatrixGenerics_1.1.6 matrixStats_0.57.0 BiocGenerics_0.35.4 [7] Matrix_1.2-18 loaded via a namespace (and not attached): [1] BiocManager_1.30.10 compiler_4.0.0 tools_4.0.0 [4] grid_4.0.0 lattice_0.20-41 ```
hpages commented 3 years ago

Makes sense. I have a long comment about that: https://github.com/Bioconductor/DelayedArray/blob/06bb47a343abfb83c7fe948e199a993adeb8639b/R/DelayedOp-class.R#L653-L666 You're even mentioned there ;-)

I'll try to find some time for this but it's going to be difficult before the release...

LTLA commented 3 years ago

Preserving sparsity with log2(+1) seems tough. Anyway, I just changed some functions in scuttle to use log1p, which should make life easier for DA - and in fact, it already does the right thing in terms of preserving sparsity, it's just the division that's the problem.

hpages commented 3 years ago

There is a lot of room to make propagation of sparsity thru delayed operations smarter. Things like log(x+1) and 1/log(log(x+1)) work already but there is a broad class of operations that are not "smart" yet. There was little incentive so far for me to work on this but now that sparse DelayedArray objects are becoming more common, things are different.