bnprks / BPCells

Scaling Single Cell Analysis to Millions of Cells
https://bnprks.github.io/BPCells
Other
166 stars 17 forks source link

`[<-` method removing names #67

Closed Yunuuuu closed 9 months ago

Yunuuuu commented 9 months ago
mock_matrix <- function(ngenes, ncells) {
    cell.means <- 2^stats::runif(ngenes, 2, 10)
    cell.disp <- 100 / cell.means + 0.5
    cell.data <- matrix(stats::rnbinom(ngenes * ncells,
        mu = cell.means,
        size = 1 / cell.disp
    ), ncol = ncells)
    rownames(cell.data) <- sprintf("Gene_%s", formatC(seq_len(ngenes),
        width = 4, flag = 0
    ))
    colnames(cell.data) <- sprintf("Cell_%s", formatC(seq_len(ncells),
        width = 3, flag = 0
    ))
    cell.data
}
mat <- mock_matrix(2000, 200)
path <- normalizePath(tempfile(tmpdir = tempdir()), mustWork = FALSE)
obj <- BPCells::write_matrix_dir(mat = as(mat, "dgCMatrix"), dir = path)
#> Warning: Matrix compression performs poorly with non-integers.
#> • Consider calling convert_matrix_type if a compressed integer matrix is intended.
#> This message is displayed once every 8 hours.
value <- matrix(sample(mat, length(mat)), nrow = nrow(mat))
obj[1:10, ] <- value[1:10, ]
mat[1:10, ] <- value[1:10, ]
testthat::expect_equal(as.matrix(obj), mat)
#> Warning: Converting to a dense matrix may use excessive memory
#> This message is displayed once every 8 hours.
obj[, 1:10] <- value[, 1:10]
mat[, 1:10] <- value[, 1:10]
testthat::expect_equal(as.matrix(obj), mat)
obj[1:10, 1:10] <- value[1:10, 1:10]
#> Warning: cbind: column names presenent on some but not all matrices. Setting
#> missing names to ""
#> Warning: rbind: column names are mismatched. Setting names to match first
#> matrix
mat[1:10, 1:10] <- value[1:10, 1:10]
testthat::expect_equal(as.matrix(obj), mat)
#> Error: as.matrix(obj) not equal to `mat`.
#> Attributes: < Component "dimnames": Component 2: 190 string mismatches >

Created on 2024-01-28 with reprex v2.0.2 ~

Yunuuuu commented 9 months ago

I have tried to fix this by following codes, all classes work well except ConvertMatrixType, any opinions for these?

methods::setMethod(
    "[<-", c("IterableMatrix", "ANY", "ANY", "ANY"),
    function(x, i, j, ..., value) {
        value <- as(value, "dgCMatrix")
        methods::callGeneric()
    }
)

methods::setMethod(
    "[<-", c("IterableMatrix", "ANY", "ANY", "dgCMatrix"),
    function(x, i, j, ..., value) {
        if (x@transpose) {
            value <- t(methods::as(t(value), "IterableMatrix"))
        } else {
            value <- methods::as(value, "IterableMatrix")
        }
        methods::callGeneric()
    }
)

methods::setMethod(
    "[<-", c("IterableMatrix", "ANY", "ANY", "IterableMatrix"),
    function(x, i, j, ..., value) {
        i <- selection_index(i, nrow(x), rownames(x))
        ni <- if (length(i) > 0) seq_len(nrow(x))[-i] else seq_len(nrow(x))
        x_i <- x[i, ]
        x_ni <- x[ni, ]
        # dispatch the "IterableMatrix", "missing", "ANY", "IterableMatrix" method
        x_i <- methods::callGeneric(x = x_i, j = j, value = value)
        rbind(x_i, x_ni)[order(c(i, ni)), ]
    }
)

methods::setMethod(
    "[<-", c("IterableMatrix", "ANY", "missing", "IterableMatrix"),
    function(x, i, j, ..., value) {
        i <- selection_index(i, nrow(x), rownames(x))
        ni <- if (length(i) > 0) seq_len(nrow(x))[-i] else seq_len(nrow(x))

        x_i <- x[i, ]
        x_ni <- x[ni, ]
        if (any(dim(x_i) != dim(value))) {
            stop("Mismatched dimensions in assignment to subset")
        }
        rownames(value) <- rownames(x_i)
        colnames(value) <- colnames(x_i)
        rbind(value, x_ni)[order(c(i, ni)), ]
    }
)

methods::setMethod(
    "[<-", c("IterableMatrix", "missing", "ANY", "IterableMatrix"),
    function(x, i, j, ..., value) {
        x <- t(x)
        x[j, , ...] <- t(value)
        t(x)
    }
)

methods::setMethod(
    "[<-", c("IterableMatrix", "missing", "missing", "IterableMatrix"),
    function(x, i, j, ..., value) {
        if (any(dim(x) != dim(value))) {
            stop("Mismatched dimensions in assignment to subset")
        }
        rownames(value) <- rownames(x)
        colnames(value) <- colnames(x)
        value
    }
)
Yunuuuu commented 9 months ago

The original [<- method also gave wrong values for ConvertMatrixType object, see https://github.com/bnprks/BPCells/issues/68 for small examples.

bnprks commented 9 months ago

This should be fixed by my most recent commit 450c2b9. Please comment/reopen if you spot remaining issues.

Thanks again for finding this problem and providing such clear examples to reproduce. Sorry to not go with your suggested solution due to my code-style preferences, but I appreciate all the bugs you've been identifying

Yunuuuu commented 9 months ago

Hi, @bnprks , thanks for the fast fix, but it remains not perfect. For some situations, [<- will also removing names, here are some examples:

library(BPCells)
mock_matrix <- function(ngenes, ncells) {
    cell.means <- 2^stats::runif(ngenes, 2, 10)
    cell.disp <- 100 / cell.means + 0.5
    cell.data <- matrix(stats::rnbinom(ngenes * ncells,
        mu = cell.means,
        size = 1 / cell.disp
    ), ncol = ncells)
    rownames(cell.data) <- sprintf("Gene_%s", formatC(seq_len(ngenes),
        width = 4, flag = 0
    ))
    colnames(cell.data) <- sprintf("Cell_%s", formatC(seq_len(ncells),
        width = 3, flag = 0
    ))
    cell.data
}
mat <- mock_matrix(200, 200)
path <- normalizePath(tempfile(tmpdir = tempdir()), mustWork = FALSE)
obj <- BPCells::write_matrix_dir(mat = as(mat, "dgCMatrix"), dir = path)
#> Warning: Matrix compression performs poorly with non-integers.
#> • Consider calling convert_matrix_type if a compressed integer matrix is intended.
#> This message is displayed once every 8 hours.
obj <- BPCells:::rank_transform(obj, "col")
obj <- BPCells::convert_matrix_type(obj, "double")
mat <- as.matrix(obj)
#> Warning: Converting to a dense matrix may use excessive memory
#> This message is displayed once every 8 hours.
value <- matrix(sample(mat, length(mat)), nrow = nrow(mat))
obj[1:10, 1:10] <- value[1:10, 1:10]
#> Warning: cbind: column names present on some but not all matrices. Setting
#> missing names to ""
#> Warning: rbind: column names are mismatched. Setting names to match first
#> matrix
mat[1:10, 1:10] <- value[1:10, 1:10]
testthat::expect_equal(as.matrix(obj), mat)
#> Error: as.matrix(obj) not equal to `mat`.
#> Attributes: < Component "dimnames": Component 2: 190 string mismatches >

Created on 2024-02-04 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.1 (2023-06-16) #> os Ubuntu 22.04.3 LTS #> system x86_64, linux-gnu #> ui X11 #> language en #> collate C.UTF-8 #> ctype C.UTF-8 #> tz Asia/Shanghai #> date 2024-02-04 #> pandoc 2.9.2.1 @ /usr/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> BiocGenerics 0.46.0 2023-04-25 [1] Bioconductor #> bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.1) #> BPCells * 0.1.0 2024-02-02 [1] Github (bnprks/BPCells@ff0c56f) #> brio 1.1.4 2023-12-10 [1] CRAN (R 4.3.1) #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.1) #> desc 1.4.3 2023-12-10 [1] CRAN (R 4.3.1) #> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.1) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.1) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.1) #> GenomeInfoDb 1.36.1 2023-06-21 [1] Bioconductor #> GenomeInfoDbData 1.2.10 2023-08-08 [1] Bioconductor #> GenomicRanges 1.52.0 2023-04-25 [1] Bioconductor #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.1) #> IRanges 2.34.1 2023-06-22 [1] Bioconductor #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) #> lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.1) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.1) #> Matrix 1.6-4 2023-11-30 [1] CRAN (R 4.3.1) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1) #> pkgload 1.3.3 2023-09-22 [1] CRAN (R 4.3.1) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.1) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.1) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.1) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.1) #> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.3.1) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.1) #> Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1) #> RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.3.1) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.1) #> rlang 1.1.2 2023-11-04 [1] CRAN (R 4.3.1) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) #> rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.3.1) #> RSpectra 0.16-1 2022-04-24 [1] CRAN (R 4.3.1) #> S4Vectors 0.38.1 2023-05-02 [1] Bioconductor #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1) #> styler 1.10.2 2024-01-13 [1] Github (r-lib/styler@17f91fc) #> testthat 3.2.1.9000 2024-01-19 [1] Github (r-lib/testthat@fe50a22) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1) #> withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1) #> xfun 0.41 2023-11-01 [1] CRAN (R 4.3.1) #> XVector 0.40.0 2023-04-25 [1] Bioconductor #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.1) #> zlibbioc 1.46.0 2023-04-25 [1] Bioconductor #> #> [1] /home/yun/Rlibrary/4.3 #> [2] /usr/local/lib/R/site-library #> [3] /usr/lib/R/site-library #> [4] /usr/lib/R/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
bnprks commented 9 months ago

Thanks for following up on this. Should be fixed in commit 9e9dd2e7c

Interestingly this bug originated in some relatively new changes in [, not in [<-.

Hopefully this won't continue to be an unending source of bugs -- at the very least we get a new correctness test each time you find a new issue. I'll try to at least be prompt with fixes if you keep finding more problems 😅