bnprks / BPCells

Scaling Single Cell Analysis to Millions of Cells
https://bnprks.github.io/BPCells
Other
167 stars 17 forks source link

`transpose_storage_order` occupy a lot of memory for small data on transposed `TransformScaleShift` object #71

Closed Yunuuuu closed 9 months ago

Yunuuuu commented 10 months ago

A small expamle, my computer has 128G memory size, but this small matrix (200 * 200) will cause huge memory occupation and system crash

library(BPCells)
mock_matrix <- function(ngenes, ncells) {
    cell.means <- 2^stats::runif(ngenes, 2, 10)
    cell.disp <- 100 / cell.means + 0.5
    cell.data <- matrix(stats::rnbinom(ngenes * ncells,
        mu = cell.means,
        size = 1 / cell.disp
    ), ncol = ncells)
    rownames(cell.data) <- sprintf("Gene_%s", formatC(seq_len(ngenes),
        width = 4, flag = 0
    ))
    colnames(cell.data) <- sprintf("Cell_%s", formatC(seq_len(ncells),
        width = 3, flag = 0
    ))
    cell.data
}
mat <- mock_matrix(200, 200)
path <- normalizePath(tempfile(tmpdir = tempdir()), mustWork = FALSE)
obj <- BPCells::write_matrix_dir(mat = as(mat, "dgCMatrix"), dir = path)
obj <- obj + 1
BPCells::transpose_storage_order(t(obj))
bnprks commented 9 months ago

Thanks for reporting this as well, should be fixed in the main branch with my most recent commit. (As usual, reopen/comment if I haven't properly fixed this)

Sorry for all the bugs you've been hitting -- I guess the BPCellsArray wrapper ends up using BPCells in different ways that I don't have adequately tested. I appreciate you reporting everything though. Please keep submitting bugs as you find them and I'll try to fix them quickly :)