bnprks / BPCells

Scaling Single Cell Analysis to Millions of Cells
https://bnprks.github.io/BPCells
Other
150 stars 17 forks source link

`float` matrix type will cause precision loss for large values #73

Closed Yunuuuu closed 7 months ago

Yunuuuu commented 7 months ago

Although float matrix and double both are equal, some large values in float matrix become Inf after subset assign, I don't know if this is a issue, maybe we should recomment the usage of double matrix type?

library(BPCells)
mock_matrix <- function(ngenes, ncells) {
    cell.means <- 2^stats::runif(ngenes, 2, 10)
    cell.disp <- 100 / cell.means + 0.5
    cell.data <- matrix(stats::rnbinom(ngenes * ncells,
        mu = cell.means,
        size = 1 / cell.disp
    ), ncol = ncells)
    rownames(cell.data) <- sprintf("Gene_%s", formatC(seq_len(ngenes),
        width = 4, flag = 0
    ))
    colnames(cell.data) <- sprintf("Cell_%s", formatC(seq_len(ncells),
        width = 3, flag = 0
    ))
    cell.data
}
mat <- mock_matrix(20, 30)
path <- normalizePath(tempfile(tmpdir = tempdir()), mustWork = FALSE)
obj <- BPCells::write_matrix_dir(mat = as(mat, "dgCMatrix"), dir = path)
#> Warning: Matrix compression performs poorly with non-integers.
#> • Consider calling convert_matrix_type if a compressed integer matrix is intended.
#> This message is displayed once every 8 hours.
obj <- BPCells::expm1_slow(obj)
mat <- as.matrix(obj)
#> Warning: Converting to a dense matrix may use excessive memory
#> This message is displayed once every 8 hours.
value <- matrix(sample(mat, length(mat)), nrow = nrow(mat))
waldo::compare(
    as.matrix(BPCells::convert_matrix_type(obj, "float")),
    as.matrix(obj)
)
#> ✔ No differences
obj <- BPCells::convert_matrix_type(obj, "float")
obj[1:5, 1:5] <- value[1:5, 1:5]
#> Warning: Converting input matrix type to match destination
#> • input type: double
#> • destination type: float
mat[1:5, 1:5] <- value[1:5, 1:5]
waldo::compare(as.matrix(obj), mat)
#> old vs new
#>                     [, 1]         [, 2]         [, 3]         [, 4]         [, 5]         [, 6]         [, 7]         [, 8]         [, 9]         [,10]         [,11]        [,12]         [,13]         [,14]         [,15]         [,16]         [,17]         [,18]         [,19]         [,20]         [,21]         [,22]         [,23]         [,24]         [,25]         [,26]         [,27]         [,28]         [,29]         [,30]
#> - old[1, ]            Inf           Inf  1.718282e+00  6.389056e+00  5.834617e+14  4.424124e+05  1.718282e+00  1.142007e+26  1.908554e+01  9.496120e+19  6.389056e+00 6.389056e+00  6.389056e+00           Inf  5.359815e+01  2.415495e+07  8.102084e+03  1.202603e+06           Inf  5.359815e+01  1.068647e+13  8.102084e+03  8.102084e+03  0.000000e+00  5.359815e+01  4.424124e+05  3.931334e+12           Inf  0.000000e+00           Inf
#> + new[1, ]  4.137619e+263 3.557678e+128  1.718282e+00  6.389056e+00  5.834617e+14  4.424124e+05  1.718282e+00  1.142007e+26  1.908554e+01  9.496119e+19  6.389056e+00 6.389056e+00  6.389056e+00  1.270899e+62  5.359815e+01  2.415495e+07  8.102084e+03  1.202603e+06  2.991508e+79  5.359815e+01  1.068647e+13  8.102084e+03  8.102084e+03  0.000000e+00  5.359815e+01  4.424124e+05  3.931334e+12 8.468222e+102  0.000000e+00 2.608817e+248
#> - old[2, ]   0.000000e+00  5.359815e+01  9.496120e+19  1.474132e+02  2.979958e+03  1.957296e+11  0.000000e+00  5.359815e+01  0.000000e+00  2.202546e+04  1.651636e+38 1.908554e+01  0.000000e+00  2.979958e+03  1.718282e+00  1.506097e+35  3.269016e+06  6.389056e+00  1.474132e+02  3.584913e+09  0.000000e+00  3.269016e+06           Inf  1.718282e+00  1.908554e+01  5.359815e+01           Inf  0.000000e+00  0.000000e+00  1.506097e+35
#> + new[2, ]   0.000000e+00  5.359815e+01  9.496119e+19  1.474132e+02  2.979958e+03  1.957296e+11  0.000000e+00  5.359815e+01  0.000000e+00  2.202547e+04  1.651636e+38 1.908554e+01  0.000000e+00  2.979958e+03  1.718282e+00  1.506097e+35  3.269016e+06  6.389056e+00  1.474132e+02  3.584913e+09  0.000000e+00  3.269016e+06  2.178204e+47  1.718282e+00  1.908554e+01  5.359815e+01  9.017628e+39  0.000000e+00  0.000000e+00  1.506097e+35
#> - old[3, ]   1.718282e+00           Inf  2.979958e+03  2.515439e+30  3.104298e+26           Inf  1.171914e+16           Inf           Inf           Inf           Inf          Inf           Inf           Inf  6.565997e+07           Inf           Inf           Inf           Inf           Inf           Inf           Inf           Inf           Inf  6.837671e+30  3.269016e+06           Inf           Inf           Inf           Inf
#> + new[3, ]   1.718282e+00 4.778215e+246  2.979958e+03  2.515439e+30  3.104298e+26  7.120586e+53  1.171914e+16 1.242849e+148  1.935576e+54 1.682013e+147 1.378244e+108 7.225974e+86  6.165958e+70 2.292499e+298  6.565997e+07 4.572186e+147 1.942426e+130  2.658287e+86  5.622626e+67 2.301901e+103 7.836307e+132  1.886181e+64 1.579535e+210  8.013164e+46  6.837671e+30  3.269016e+06  1.029820e+66  6.761794e+73 2.243158e+113  3.505791e+95
#> - old[4, ]            Inf  0.000000e+00  1.908554e+01  0.000000e+00  1.068647e+13  1.718282e+00  1.409349e+22  7.896296e+13  4.024288e+02  5.987314e+04  7.498417e+33 0.000000e+00  4.024288e+02  1.041376e+23  6.076030e+37  1.718282e+00  5.359815e+01  0.000000e+00  3.831008e+22  8.102084e+03           Inf  7.694785e+23  8.886110e+06  3.104298e+26  3.185593e+16           Inf  0.000000e+00  1.718282e+00           Inf  1.718282e+00
#> + new[4, ]   3.454661e+62  0.000000e+00  1.908554e+01  0.000000e+00  1.068647e+13  1.718282e+00  1.409349e+22  7.896296e+13  4.024288e+02  5.987314e+04  7.498417e+33 0.000000e+00  4.024288e+02  1.041376e+23  6.076030e+37  1.718282e+00  5.359815e+01  0.000000e+00  3.831008e+22  8.102084e+03  7.307060e+43  7.694785e+23  8.886110e+06  3.104298e+26  3.185593e+16  2.688117e+43  0.000000e+00  1.718282e+00  4.154590e+68  1.718282e+00
#> - old[5, ]            Inf           Inf  3.584913e+09  1.784823e+08           Inf  1.718282e+00           Inf  1.651636e+38           Inf  1.784823e+08           Inf 1.907347e+21  7.200490e+10  7.200490e+10  4.424124e+05  1.739275e+18  7.498417e+33  5.184705e+21  5.359815e+01  0.000000e+00  3.104298e+26  8.659340e+16  1.506097e+35  5.359815e+01           Inf  8.659340e+16  1.474132e+02  7.498417e+33           Inf  1.252363e+29
#> + new[5, ]  8.069985e+198 5.498530e+153  3.584913e+09  1.784823e+08           Inf  1.718282e+00 1.700888e+104  1.651636e+38 1.551009e+101  1.784823e+08  3.366499e+72 1.907347e+21  7.200490e+10  7.200490e+10  4.424124e+05  1.739275e+18  7.498417e+33  5.184706e+21  5.359815e+01  0.000000e+00  3.104298e+26  8.659340e+16  1.506097e+35  5.359815e+01  9.390741e+62  8.659340e+16  1.474132e+02  7.498417e+33  6.663176e+40  1.252363e+29
#> - old[6, ]   0.000000e+00  6.389056e+00  2.979958e+03  1.318816e+09  1.694889e+28  0.000000e+00  1.627538e+05  0.000000e+00  0.000000e+00  2.202546e+04  2.979958e+03 1.908554e+01  4.424124e+05  0.000000e+00  0.000000e+00  0.000000e+00  1.068647e+13  1.446257e+12           Inf  1.095633e+03           Inf  0.000000e+00  8.659340e+16  0.000000e+00  1.068647e+13  0.000000e+00  0.000000e+00  0.000000e+00  8.659340e+16  0.000000e+00
#> + new[6, ]   0.000000e+00  6.389056e+00  2.979958e+03  1.318816e+09  1.694889e+28  0.000000e+00  1.627538e+05  0.000000e+00  0.000000e+00  2.202547e+04  2.979958e+03 1.908554e+01  4.424124e+05  0.000000e+00  0.000000e+00  0.000000e+00  1.068647e+13  1.446257e+12  1.100514e+79  1.095633e+03 1.942426e+130  0.000000e+00  8.659340e+16  0.000000e+00  1.068647e+13  0.000000e+00  0.000000e+00  0.000000e+00  8.659340e+16  0.000000e+00
#> - old[7, ]   1.095633e+03  1.041376e+23  1.718282e+00  2.202546e+04           Inf  3.584913e+09  2.581313e+20  1.718282e+00  0.000000e+00           Inf  6.389056e+00 4.424124e+05  3.104298e+26  6.389056e+00  1.627538e+05  2.979958e+03  2.648912e+10  1.545539e+25  1.694889e+28  9.744804e+09  0.000000e+00  2.515439e+30  0.000000e+00  6.389056e+00  0.000000e+00  1.718282e+00  1.627538e+05  0.000000e+00  0.000000e+00  7.896296e+13
#> + new[7, ]   1.095633e+03  1.041376e+23  1.718282e+00  2.202547e+04  1.289708e+95  3.584913e+09  2.581313e+20  1.718282e+00  0.000000e+00  1.189259e+49  6.389056e+00 4.424124e+05  3.104298e+26  6.389056e+00  1.627538e+05  2.979958e+03  2.648912e+10  1.545539e+25  1.694889e+28  9.744803e+09  0.000000e+00  2.515439e+30  0.000000e+00  6.389056e+00  0.000000e+00  1.718282e+00  1.627538e+05  0.000000e+00  0.000000e+00  7.896296e+13
#> - old[8, ]            Inf           Inf  1.041376e+23           Inf           Inf  4.424124e+05  5.184705e+21           Inf           Inf           Inf           Inf 2.581313e+20           Inf  1.908554e+01           Inf           Inf  3.584913e+09  1.784823e+08  1.718282e+00  1.285160e+19           Inf  4.424124e+05  5.359815e+01  4.201210e+25           Inf  2.581313e+20  6.389056e+00  4.424124e+05           Inf  0.000000e+00
#> + new[8, ]   7.307060e+43 5.298749e+206  1.041376e+23 5.941927e+123 8.069985e+198  4.424124e+05  5.184706e+21  1.633308e+81  1.414337e+98 1.163011e+135 2.979289e+274 2.581313e+20 6.612556e+159  1.908554e+01  3.317400e+39 5.726037e+176  3.584913e+09  1.784823e+08  1.718282e+00  1.285160e+19  1.935576e+54  4.424124e+05  5.359815e+01  4.201210e+25 2.737557e+152  2.581313e+20  6.389056e+00  4.424124e+05  4.439792e+81  0.000000e+00
#>   old[9, ]   2.353853e+17  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  1.908554e+01  6.389056e+00  0.000000e+00  0.000000e+00  0.000000e+00  5.834617e+14 8.886110e+06  0.000000e+00  0.000000e+00  0.000000e+00  1.718282e+00  0.000000e+00  0.000000e+00  1.908554e+01  6.389056e+00  0.000000e+00  0.000000e+00  0.000000e+00  1.718282e+00  0.000000e+00  3.831008e+22  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
#> - old[10, ]           Inf           Inf           Inf           Inf           Inf           Inf           Inf           Inf           Inf           Inf           Inf 5.052394e+31           Inf           Inf           Inf           Inf           Inf           Inf           Inf           Inf           Inf           Inf           Inf           Inf           Inf           Inf           Inf  7.694785e+23           Inf           Inf
#> + new[10, ]           Inf 1.146049e+102  2.268329e+70 4.708527e+213 2.037140e+305 2.137668e+209           Inf 1.511428e+111           Inf 3.746455e+108           Inf 5.052394e+31           Inf 6.785725e+149 6.562320e+278           Inf           Inf 4.308817e+286 6.562320e+278  3.454661e+62 1.279910e+214  3.505791e+95 5.088220e+183           Inf           Inf  4.048566e+78 1.693940e+299  7.694785e+23 1.201931e+277 4.521448e+190
#> and 10 more ...

Created on 2024-02-10 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.1 (2023-06-16) #> os Ubuntu 22.04.3 LTS #> system x86_64, linux-gnu #> ui X11 #> language en #> collate C.UTF-8 #> ctype C.UTF-8 #> tz Asia/Shanghai #> date 2024-02-10 #> pandoc 2.9.2.1 @ /usr/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> BiocGenerics 0.46.0 2023-04-25 [1] Bioconductor #> bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.1) #> BPCells * 0.1.0 2024-02-06 [1] Github (bnprks/BPCells@919983d) #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.1) #> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1) #> diffobj 0.3.5 2021-10-05 [1] CRAN (R 4.3.1) #> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.1) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.1) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.1) #> GenomeInfoDb 1.36.1 2023-06-21 [1] Bioconductor #> GenomeInfoDbData 1.2.10 2023-08-08 [1] Bioconductor #> GenomicRanges 1.52.0 2023-04-25 [1] Bioconductor #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.1) #> IRanges 2.34.1 2023-06-22 [1] Bioconductor #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) #> lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.1) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.1) #> Matrix 1.6-4 2023-11-30 [1] CRAN (R 4.3.1) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.1) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.1) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.1) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.1) #> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.3.1) #> Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1) #> RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.3.1) #> rematch2 2.1.2 2020-05-01 [1] CRAN (R 4.3.1) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.1) #> rlang 1.1.2 2023-11-04 [1] CRAN (R 4.3.1) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) #> RSpectra 0.16-1 2022-04-24 [1] CRAN (R 4.3.1) #> S4Vectors 0.38.1 2023-05-02 [1] Bioconductor #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1) #> styler 1.10.2 2024-01-13 [1] Github (r-lib/styler@17f91fc) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.1) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1) #> waldo 0.5.2 2023-11-02 [1] CRAN (R 4.3.1) #> withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1) #> xfun 0.41 2023-11-01 [1] CRAN (R 4.3.1) #> XVector 0.40.0 2023-04-25 [1] Bioconductor #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.1) #> zlibbioc 1.46.0 2023-04-25 [1] Bioconductor #> #> [1] /home/yun/Rlibrary/4.3 #> [2] /usr/local/lib/R/site-library #> [3] /usr/lib/R/site-library #> [4] /usr/lib/R/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
bnprks commented 7 months ago

This is expected behavior, inherent in the precision limitations of using a float (aka float32) vs a double (aka float64) -- for instance this wikipedia table lists the precision limitations, among them the fact that floats cannot represent numbers larger than 3.4e38.

BPCells does in general default to using double precision math for its transformations. If users explicitly convert to floats, it's their responsibility to ensure their inputs fall within the representable range for floats. (One slight tripping hazard in this respect is that the default log1p implementation uses single-precision math in intermediate stages because the 2x speed boost of floats seems worth the tradeoff -- a log1p_slow function is available that operates fully in double precision)

I appreciate the detailed checks, but I think for this one I'm considering it within the expected behavior. Though if you find a case where it converts a number to Inf that should be representable as a float then do let me know.