Bioconductor / SparseArray

High-performance sparse data representation and manipulation in R
8 stars 2 forks source link

Request: simple wrapper to generate empty SparseArray #15

Closed taylorpetty closed 4 months ago

taylorpetty commented 4 months ago

Requesting a simple emptySparseArray(dim = c(x_1, ..., x_n), type = desired_type) functionality. This is possible as a straightforward wrapper of a function that already exists in the package.

I need to pre-allocate an enormous empty 4D-array that is too large to fit in RAM, so I can't use sparse matrix constructors (not high enough dimension) or dense array constructors (too big for RAM). I need the storage type to be integer for memory efficiency.

If I use x=poissonSparseArray(..., lambda=0), then x$type prints out integer, but the same is not true for randomSparseArray(..., density=0).

It is counterintuitive not to be able to initialize an "empty" (all zeros) array and the easiest solution would be to write a wrapper around poissonSparseArray to be able to create empty sparse arrays of arbitrary dimension. There is likely an even simpler solution as well.

For now I am going to use poissonSparseArray(..., lambda=0). If this is suboptimal or if I am missing something, then I apologize, but I have hunted through the vignettes and manual for quite some time.

hpages commented 4 months ago

This functionality was added last week (in SparseArray 1.5.15):

> library(SparseArray)
> SVT_SparseArray(dim=c(8000, 2500, 300, 5), type="integer")
<8000 x 2500 x 300 x 5 SparseArray> of type "integer" [nzcount=0 (0%)]:
,,1,1
           [,1]    [,2]    [,3]    [,4] ... [,2497] [,2498] [,2499] [,2500]
   [1,]       0       0       0       0   .       0       0       0       0
   [2,]       0       0       0       0   .       0       0       0       0
    ...       .       .       .       .   .       .       .       .       .
[7999,]       0       0       0       0   .       0       0       0       0
[8000,]       0       0       0       0   .       0       0       0       0

...

,,300,5
           [,1]    [,2]    [,3]    [,4] ... [,2497] [,2498] [,2499] [,2500]
   [1,]       0       0       0       0   .       0       0       0       0
   [2,]       0       0       0       0   .       0       0       0       0
    ...       .       .       .       .   .       .       .       .       .
[7999,]       0       0       0       0   .       0       0       0       0
[8000,]       0       0       0       0   .       0       0       0       0

See ?SVT_SparseArray for more information.

You'll need Bioconductor 3.20 (current devel) for that. Use BiocManager::install(version="devel") to upgrade your installation to BioC devel.

H.