Direct converters from H5SparseMatrixSeed-containing DelayedMatrix to dgCMatrix #45

Open LTLA opened 2 years ago

LTLA commented 2 years ago

I'm trying to convert a H5SparseMatrixSeed-containing DelayedMatrix to a dgCMatrix. This calls:

selectMethod(coerce, c("DelayedMatrix", "dgCMatrix") )
## Method Definition:
## function (from, to = "dgCMatrix", strict = TRUE)
## as(as(from, "SparseArraySeed"), "dgCMatrix")
## <bytecode: 0x55555e41dde0>
## <environment: namespace:DelayedArray>
## Signatures:
##         from            to
## target  "DelayedMatrix" "dgCMatrix"
## defined "Array"         "dgCMatrix"

Well, okay. But then this calls:

selectMethod(coerce, c("DelayedMatrix", "SparseArraySeed") )
## Method Definition:
## function (from, to = "SparseArraySeed", strict = TRUE)
## .BLOCK_dense2sparse(from)
## <bytecode: 0x55555e4180f0>
## <environment: namespace:DelayedArray>
## Signatures:
##         from            to
## target  "DelayedMatrix" "SparseArraySeed"
## defined "DelayedArray"  "SparseArraySeed"

Directly coercing H5SparseMatrixSeeds to SparseArraySeeds isn't much better:

selectMethod(coerce, c("H5SparseMatrixSeed", "SparseArraySeed") )
## Method Definition:
## function (from, to = "SparseArraySeed", strict = TRUE)
## dense2sparse(from)
## <bytecode: 0x55555e420978>
## <environment: namespace:DelayedArray>
## Signatures:
##         from                 to
## target  "H5SparseMatrixSeed" "SparseArraySeed"
## defined "ANY"                "SparseArraySeed"```

All in all, this makes it painfully slow to load a H5SparseMatrix into memory after applying some operations on it (e.g., like slapping on dimnames, which is commonly what we get out of SummarizedExperiment::assay()).

The solution is to simply define the missing sparse-to-sparse methods - something like the below might work:

setMethod("coerce", c("H5SparseMatrixSeed", "SparseArraySeed"), function(from) {
    extract_sparse_array(from, list(NULL, NULL)) 

setMethod("coerce", c("DelayedMatrix", "SparseArraySeed"), function(from) {
    if (is_sparse(from)) {
        extract_sparse_array(from, list(NULL, NULL))
    } else {

# Ideally we should be able to do something like this, to bypass the intermediate
# SparseArraySeed object and directly return a dgCMatrix. This would enable
# very efficient conversions from CSC_H5SparseMatrixSeed representations, 
# while also adjusting for the common occurrence of assay() adding dimnames.
setMethod("coerce", c("DelayedMatrix", "dgCMatrix"), function(from) {
    if (isPristine(from, ignore.dimnames=TRUE)) {
        output <- as(from@seed, "dgCMatrix")
        dimnames(output) <- dimnames(from)
    } else {
        as(as(from, "SparseArraySeed"), "dgCMatrix")
