Bioconductor / MatrixGenerics

S4 Generic Summary Statistic Functions that Operate on Matrix-Like Objects
https://bioconductor.org/packages/MatrixGenerics
12 stars 1 forks source link

Check and normalize 'center' argument before passing down to matrixStats #38

Closed hpages closed 5 months ago

hpages commented 5 months ago

I see no reason why this shouldn't work anymore:

library(MatrixGenerics)
m <- matrix(101:128, ncol=4)

MatrixGenerics::colVars(m, center=0)
# Error: [matrixStats (>= 0.58.0)] Argument 'center' should be of the same length as number of columns of 'x'.
# Use of a scalar value is defunct: 1 != 4 (See also ?matrixStats::matrixStats.options)

The proposed patch makes it work again:

MatrixGenerics::colVars(m, center=0)
# [1] 12623.33 14379.17 16249.33 18233.83

and so is consistent with SparseArray:

library(SparseArray)
MatrixGenerics::colVars(SparseArray(m), center=0)
# [1] 12623.33 14379.17 16249.33 18233.83

and also consistent with the treatment of center in DelayedArray.

It also fixes some loopholes in matrixStats check of user input. For example:

## Error message misses the point:
matrixStats::colVars(m, center=ls)
# Error in validateScalarCenter(center, ncol(x), "columns") : 
#   [matrixStats (>= 0.58.0)] Argument 'center' should be of the same length as number of columns of 'x'.
# Use of a scalar value is defunct: 1 != 4 (See also ?matrixStats::matrixStats.options)

## Error message is somewhat cryptic:
matrixStats::colVars(m, center=letters[1:4])
# Error in x[, cc] - center[cc] : non-numeric argument to binary operator

## Silently works but fails to cacth what's probably a user error:
matrixStats::colVars(m, center=c(TRUE, FALSE, TRUE, TRUE))
# [1] 12381.83 14379.17 15975.17 17943.33

## What's going on??!
matrixStats::colVars(m, center=factor(11:14))
# [1] NA NA NA NA
# Warning messages:
# 1: In Ops.factor(x[, cc], center[cc]) : ‘-’ not meaningful for factors
# 2: In Ops.factor(x[, cc], center[cc]) : ‘-’ not meaningful for factors
# 3: In Ops.factor(x[, cc], center[cc]) : ‘-’ not meaningful for factors
# 4: In Ops.factor(x[, cc], center[cc]) : ‘-’ not meaningful for factors

In comparison going thru MatrixGenerics::normarg_center() guarantees a sane center value, and, if not, it produces a straightforward error message:

MatrixGenerics::colVars(m, center=ls)
# Error in normarg_center(center, ncol(x), "ncol(x)") : 
#   is.numeric(center) is not TRUE

MatrixGenerics::colVars(m, center=letters[1:4])
# Error in normarg_center(center, ncol(x), "ncol(x)") : 
#   is.numeric(center) is not TRUE

MatrixGenerics::colVars(m, center=c(TRUE, FALSE, TRUE, TRUE))
# Error in normarg_center(center, ncol(x), "ncol(x)") : 
#   is.numeric(center) is not TRUE

MatrixGenerics::colVars(m, center=factor(11:14))
# Error in normarg_center(center, ncol(x), "ncol(x)") : 
#   is.numeric(center) is not TRUE

Note that the reason this patch exports normarg_center() is because I want to reuse it in a PR that I'm about to submit for DelayedMatrixStats (see https://github.com/Bioconductor/DelayedArray/issues/115). I will also probably start using it in SparseArray and DelayedArray for consistent argument checking.

H.

P.S.: @const-ae It seems that you've recently aligned treatment of center in sparseMatrixStats with matrixStats. With the latest sparseMatrixStats (version 1.15.1, in BioC 3.19):

> MatrixGenerics::colVars(as(m, "dgCMatrix"), center=0)
Error in .local(x, rows, cols, na.rm, center, ..., useNames) : 
  Argument 'center' must be the same length as number of columns of 'x': 1!=4

2 problems with this change:

const-ae commented 5 months ago

The reason why I decided to skip the deprecation process and immediately align with sparseMatrixStats with matrixStats is that center argument of the wrong length never worked in sparseMatrixStats:

# Code using an old sparseMatrixStats version
library(Matrix)
mat <- matrix(1:12, nrow = 4)
sp_mat <- as(mat, "dgCMatrix")
matrixStats::colVars(mat, center = 5)
#> Warning: Argument 'center' should be of the same length as number of columns of
#> 'x'. Use of a scalar value is deprecated: 1 != 3
#> [1] 10.000000  4.666667 42.000000
sparseMatrixStats::colVars(sp_mat, center= 5)
#> [1]  10.0000  58.0000 148.6667

Created on 2024-04-18 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.2.3 (2023-03-15) #> os macOS Big Sur ... 10.16 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_IE.UTF-8 #> ctype en_IE.UTF-8 #> tz Europe/Berlin #> date 2024-04-18 #> pandoc 3.1.6.1 @ /usr/local/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.0 2023-01-09 [2] CRAN (R 4.2.0) #> digest 0.6.31 2022-12-11 [2] CRAN (R 4.2.0) #> evaluate 0.20 2023-01-17 [2] CRAN (R 4.2.0) #> fastmap 1.1.1 2023-02-24 [2] CRAN (R 4.2.0) #> fs 1.6.1 2023-02-06 [2] CRAN (R 4.2.0) #> glue 1.6.2 2022-02-24 [2] CRAN (R 4.2.0) #> htmltools 0.5.4 2022-12-07 [2] CRAN (R 4.2.0) #> knitr 1.42 2023-01-25 [2] CRAN (R 4.2.0) #> lattice 0.20-45 2021-09-22 [2] CRAN (R 4.2.3) #> lifecycle 1.0.3 2022-10-07 [2] CRAN (R 4.2.0) #> magrittr 2.0.3 2022-03-30 [2] CRAN (R 4.2.0) #> Matrix * 1.5-3 2022-11-11 [2] CRAN (R 4.2.3) #> MatrixGenerics 1.10.0 2022-11-01 [2] Bioconductor #> matrixStats 0.63.0 2022-11-18 [2] CRAN (R 4.2.0) #> purrr 1.0.1 2023-01-10 [2] CRAN (R 4.2.0) #> R.cache 0.16.0 2022-07-21 [2] CRAN (R 4.2.0) #> R.methodsS3 1.8.2 2022-06-13 [2] CRAN (R 4.2.0) #> R.oo 1.25.0 2022-06-12 [2] CRAN (R 4.2.0) #> R.utils 2.12.2 2022-11-11 [2] CRAN (R 4.2.0) #> Rcpp 1.0.10 2023-01-22 [2] CRAN (R 4.2.0) #> reprex 2.0.2 2022-08-17 [2] CRAN (R 4.2.0) #> rlang 1.0.6 2022-09-24 [2] CRAN (R 4.2.0) #> rmarkdown 2.20 2023-01-19 [2] CRAN (R 4.2.0) #> sessioninfo 1.2.2 2021-12-06 [2] CRAN (R 4.2.0) #> sparseMatrixStats 1.10.0 2022-11-01 [2] Bioconductor #> styler 1.9.0 2023-01-15 [2] CRAN (R 4.2.0) #> vctrs 0.5.2 2023-01-23 [2] CRAN (R 4.2.0) #> withr 2.5.0 2022-03-03 [2] CRAN (R 4.2.0) #> xfun 0.37 2023-01-31 [2] CRAN (R 4.2.0) #> yaml 2.3.7 2023-01-23 [2] CRAN (R 4.2.0) #> #> [1] /Users/ahlmanne/Library/R/x86_64/4.2/library #> [2] /Library/Frameworks/R.framework/Versions/4.2/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

The original code simply never checked that the center vector had the right length. I considered this a bug and decided it was easiest to just throw an error if the argument lengths didn't match.


To your main point:

I see no reason why this shouldn't work anymore: ... MatrixGenerics::colVars(m, center=0)

I think it is important to keep MatrixGenerics aligned with matrixStats to avoid user confusion. And as it was me who decided to copy Henrik's API, I will follow his decisions on how to evolve the API. For MatrixGenerics, I think that it should try to smooth over conflicts that arise due to the different release models of CRAN and Bioconductor, but should otherwise stay aligned with matrixStats.

hpages commented 5 months ago

The center=0 case is a little bit special though. It has been used for a while in DelayedArray and the package has been relying on it to be supported by sparseMatrixStats. Which it did! And with no deprecation warning:

sparseMatrixStats::colVars(as(mat, "dgCMatrix"), center=0)
# [1]  10.0000  58.0000 148.6667

If handling of scalar center in the != 0 case was broken, it should be fixed. But again, it's not a good idea to make this kind of changes without going to a deprecation/defunct cycle first. We're also very close to a new BioC release which makes things even worse.