bnprks / BPCells

Scaling Single Cell Analysis to Millions of Cells
https://bnprks.github.io/BPCells
Other
166 stars 17 forks source link

wrong scale results after shifting #72

Closed Yunuuuu closed 9 months ago

Yunuuuu commented 9 months ago

A small examples:

mock_matrix <- function(ngenes, ncells) {
    cell.means <- 2^stats::runif(ngenes, 2, 10)
    cell.disp <- 100 / cell.means + 0.5
    cell.data <- matrix(stats::rnbinom(ngenes * ncells,
        mu = cell.means,
        size = 1 / cell.disp
    ), ncol = ncells)
    rownames(cell.data) <- sprintf("Gene_%s", formatC(seq_len(ngenes),
        width = 4, flag = 0
    ))
    colnames(cell.data) <- sprintf("Cell_%s", formatC(seq_len(ncells),
        width = 3, flag = 0
    ))
    cell.data
}
mat <- mock_matrix(30, 20)
path <- normalizePath(tempfile(tmpdir = tempdir()), mustWork = FALSE)
obj <- BPCells::write_matrix_dir(mat = as(mat, "dgCMatrix"), dir = path)
#> Warning: Matrix compression performs poorly with non-integers.
#> • Consider calling convert_matrix_type if a compressed integer matrix is intended.
#> This message is displayed once every 8 hours.
testthat::expect_equal(mat, as.matrix(obj))
#> Warning: Converting to a dense matrix may use excessive memory
#> This message is displayed once every 8 hours.
testthat::expect_equal(
    (mat + seq_len(nrow(mat))) * 10,
    as.matrix((obj + seq_len(nrow(obj))) * 10)
)
#> Error: (mat + seq_len(nrow(mat))) * 10 not equal to as.matrix((obj + seq_len(nrow(obj))) * 10).
#> 600/600 mismatches (average diff: 140)
#> [1]  380 -  371 ==  9
#> [2]   20 -    2 == 18
#> [3] 7460 - 7433 == 27
#> [4]  810 -  774 == 36
#> [5] 1940 - 1895 == 45
#> [6] 2450 - 2396 == 54
#> [7] 9260 - 9197 == 63
#> [8] 2960 - 2888 == 72
#> [9]   90 -    9 == 81
#> ...

Created on 2024-01-31 with reprex v2.0.2

bnprks commented 9 months ago

Thanks for reporting this! I'm honestly quite surprised this bug slipped through -- I think I hadn't sufficiently tested the case of multiplying by a single number rather than a vector. I've added some randomized tests that hopefully cover this and all related issues I haven't thought of.

The core challenge here is tricky logic to combine multiplications and additions into as few operations as possible, hence the possibility of bugs.

I think this should be fixed with my most recent commit, but feel free to reopen/comment if you still have problems