Bioconductor / DelayedArray

A unified framework for working transparently with on-disk and in-memory array-like datasets
https://bioconductor.org/packages/DelayedArray
24 stars 9 forks source link

colMaxs/rowMaxs and colMins/rowMins useNames behavior incorrect in RELEASE_3_17 #107

Closed PeteHaitch closed 1 year ago

PeteHaitch commented 1 year ago

Hi @hpages,

Similar to #106, these methods for DelayedMatrix objects are defined in DelayedArray rather than DelayedMatrixStats and the default useNames behaviour is incorrect in RELEASE_3_17.

suppressPackageStartupMessages(library(DelayedMatrixStats))

x <- matrix(
  -5:6,
  nrow = 3,
  ncol = 4,
  dimnames = list(c("R1", "R2", "R3"), c("C1", "C2", "C3", "C4")))
X <- DelayedArray(x)

f <- c("colMaxs", "colMins", "rowMaxs", "rowMins")

for (ff in f) {
  print(ff)
  # foo,matrix-method (no names, consistent with matrixStats < v1.0.0)
  print(get(ff)(x))
  # foo,DelayedMatrix-method (has names but shouldn't)
  print(get(ff)(X))
}
#> [1] "colMaxs"
#> [1] -3  0  3  6
#> C1 C2 C3 C4 
#> -3  0  3  6 
#> [1] "colMins"
#> [1] -5 -2  1  4
#> C1 C2 C3 C4 
#> -5 -2  1  4 
#> [1] "rowMaxs"
#> [1] 4 5 6
#> R1 R2 R3 
#>  4  5  6 
#> [1] "rowMins"
#> [1] -5 -4 -3
#> R1 R2 R3 
#> -5 -4 -3

BiocManager::version()
#> [1] '3.17'
BiocManager::valid()
#> 'getOption("repos")' replaces Bioconductor standard repositories, see
#> 'help("repositories", package = "BiocManager")' for details.
#> Replacement repositories:
#>     CRAN: https://cloud.r-project.org
#> [1] TRUE

Created on 2023-08-08 with reprex v2.0.2

hpages commented 1 year ago

This was a concious decision I made a long time ago, that all row/col summarization methods defined in DelayedArray would follow the lead of rowSums/colSums and rowMeans/colMeans in base R by propagating the rownames/colnames, independently of what matrixStats was doing. So at the time I chose consistency with base R over consistency with matrixStats.

Note that I made the same choice for SparseMatrix objects (but those are much more recent). With BioC 3.17:

library(SparseArray)
rowMaxs(SparseArray(x))
# R1 R2 R3 
#  4  5  6 

I'm not too keen to change this just in BioC 3.17. rowMaxs/colMaxs and rowMins/colMins have propagated the rownames/colnames forever on a DelayedMatrix object, and, starting with BioC 3.18, this behavior is now consistent with matrixStats and MatrixGenerics. So making an exception for BioC 3.17 doesn't sound like a good idea. Plus this would mean changing the behavior of code that was released more than 3 months ago.

Hope that makes sense?

H.

PeteHaitch commented 1 year ago

That all makes sense, thanks for the explanation. I'll just tweak DelayedMatrixStats tests in RELEASE_3_17 to accommodate that.