Bioconductor / DelayedArray

A unified framework for working transparently with on-disk and in-memory array-like datasets
https://bioconductor.org/packages/DelayedArray
24 stars 9 forks source link

extract_sparse_array on DelayedAbind with sparse backends is not yet implement #80

Closed LTLA closed 3 years ago

LTLA commented 3 years ago

For example:

library(Matrix)
x <- rsparsematrix(10, 20, 0.1)
y <- rsparsematrix(10, 30, 0.1)

library(DelayedArray)
x0 <- DelayedArray(x)
y0 <- DelayedArray(y)

Z <- cbind(x0, y0)
is_sparse(Z) ## TRUE

colSums(Z) # or really, any operation on this.
## Error in extract_sparse_array(x@seed, index) : NOT IMPLEMENTED YET!

An immediate hack would be to treat all DelayedAbind as being non-sparse so that the above code works. Though maybe it might be better to just implement it, something like:

.extract_sparse_array_from_DelayedAbind <- function(x, index)
{
    i <- index[[x@along]]

    if (is.null(i)) {
        ## This is the easy situation.
        tmp <- lapply(x@seeds, extract_sparse_array, index)

        last <- 0L
        for (a in seq_along(tmp)) {
            tmp[[a]]@nzindex[,x@along] <- tmp[[a]]@nzindex[,x@along] + last
            last <- last + dim(tmp[[a]])[x@along]
        }

        all.nzindex <- do.call(rbind, lapply(tmp, nzindex))
        all.nzdata <- unlist(lapply(tmp, nzdata))
        all.dims <- dim(tmp[[1]])
        all.dims[x@along] <- last

        # TODO: deal with the dimnames.
        return(SparseArraySeed(all.dims, nzindex=all.nzindex, nzdata=all.nzdata))
    }

    ## From now on 'i' is a vector of positive integers.
    dims <- get_dims_to_bind(x@seeds, x@along)
    breakpoints <- cumsum(dims[x@along, ])
    part_idx <- get_part_index(i, breakpoints)
    split_part_idx <- split_part_index(part_idx, length(breakpoints))
    FUN <- function(s) {
        index[[x@along]] <- split_part_idx[[s]]
        extract_sparse_array(x@seeds[[s]], index)
    }
    tmp <- lapply(seq_along(x@seeds), FUN)

    ## NOTE: this part is missing something to do with get_rev_index.
    last <- 0L
    for (a in seq_along(tmp)) {
        tmp[[a]]@nzindex[,x@along] <- tmp[[a]]@nzindex[,x@along] + last
        last <- last + dim(tmp[[a]])[x@along]
    }

    all.nzindex <- do.call(rbind, lapply(tmp, nzindex))
    all.nzdata <- unlist(lapply(tmp, nzdata))
    all.dims <- dim(tmp[[1]])
    all.dims[x@along] <- last

    # TODO: deal with the dimnames.
    return(SparseArraySeed(all.dims, nzindex=all.nzindex, nzdata=all.nzdata))
}

setMethod("extract_sparse_array", "DelayedAbind", .extract_sparse_array_from_DelayedAbind)
hpages commented 3 years ago

Done in DelayedArray 0.17.5 (commit d03a6beafabb39d9a6bee33ba2004c931fecaf61).