Open ekageyama opened 2 weeks ago
One way to do this at the moment is to go thru the COO_SparseArray representation i.e. to do as(as(<DelayedArray>, "COO_SparseArray"), "SVT_SparseArray")
. The first coercion will use block processing so won't necessarily be very efficient. The second coercion (from COO_SparseArray to SVT_SparseArray) should be quite efficient though.
But yeah, we should be able to just do as(<DelayedArray>, "SVT_SparseArray")
or realize(<DelayedArray>)
(the latter will soon be modified to return an SVT_SparseArray when the DelayedArray object is sparse). This is on my TODO list.
FYI I recently added specialized coercion methods to go from TENxMatrix, H5ADMatrix, H5SparseMatrix, TENxMatrixSeed, CSC_H5ADMatrixSeed, and CSC_H5SparseMatrixSeed, to SVT_SparseMatrix. These are quite efficient. Also they can handle big sparse datasets (i.e. datasets with more than 2^31-1 nonzero values) like the "1.3 Million Brain Cell Dataset" from 10x Genomics, as long as your machine has enough RAM:
library(HDF5Array)
library(ExperimentHub)
hub <- ExperimentHub()
fname <- hub[["EH1039"]]
oneM <- TENxMatrix(fname, group="mm10")
svt <- as(oneM, "SVT_SparseMatrix") # takes about 1.5 min and consumes about 22g of RAM
This is with HDF5Array 1.33.3 (latest devel version).
Currently it is tricky to convert a delayed array to SVT format, since there is no default or method for coercion.