Closed const-ae closed 1 year ago
the new default useNames = TRUE
The new behaviour is more inline with colSums()
and colMeans()
propagating the names in base R. Never understood the purpose of the useNames=NA
thing...
I wonder if we should patch the current release version to use useNames = FALSE ...
I guess that would work.
I'd be more confortable though if we didn't touch useNames=NA
in the generics and only made the following simple change in the methods:
.matrixStats_rowVars <- function(x, rows=NULL, cols=NULL, na.rm=FALSE, center=NULL, dim.=dim(x), ...,
useNames=NA) # <-- keep this
{
matrixStats::rowVars(x, rows=rows, cols=cols, na.rm=na.rm, center=center, dim.=dim., ...,
useNames=isTRUE(useNames)) # <-- proposed change
}
That way the argument list of the generics doesn't change and remains in sync with the argument list of all the methods in sparseMatrixStats, SparseArray, DelayedArray, and DelayedMatrixStats.
H.
I'd be more confortable though if we didn't touch useNames=NA in the generics and only made the following simple change in the methods:
matrixStats::rowVars(x, rows=rows, cols=cols, na.rm=na.rm, center=center, dim.=dim., ..., useNames=isTRUE(useNames)) # <-- proposed change
I like that idea.
I have already started to implement the new default in sparseMatrixStats
(https://github.com/const-ae/sparseMatrixStats/commit/5441917d2e8e158e86ff8dbb75f58d061440b2bc), but then I would only merge this in the devel branch.
Never understood the purpose of the
useNames=NA
thing...
In the beginning, matrixStats didn't control for names attributes, so we got whatever the implementation happened to output (yet there were packages depending on it). In 2021, we ran a GSoC project for adding support for useNames = FALSE
and useNames = TRUE
. We decided to keep the old, undefined behavior via useNames = NA
until we settled on what the new default should be, which has now been done. This will allow us to defunct useNames = NA
in a near future.
There is no need for you to keep supporting useNames = NA
. Actually, the sooner you can drop support for that, the better; it'll probably help speeding up its removal from matrixStats too.
I see. That means that in order to prevent disruption in release, we can't just blindly replace useNames=useNames
with useNames=isTRUE(useNames)
everywhere like I was proposing we do for matrixStats::rowVars
above. We'll need to figure out what the method was doing for useNames=NA
, and replace with useNames=isTRUE(useNames)
if it was not propagating the names, and with useNames=!isFALSE(useNames)
if it was propagating the names.
@const-ae I will give this a try unless you already started to work on it?
Go ahead, I am currently just barely staying afloat as I am finishing up my PhD thesis 😅
Exciting times! I suppose you don't need this kind of distraction to keep you entertained :wink:
ok so it looks like most functions did NOT propagate the names in matrixStats 0.63.0
Keeping track of this here for reference:
### With matrixStats 0.63.0 (useNames=NA)
library(matrixStats)
stopifnot(packageVersion("matrixStats") == '0.63.0')
# 36 row* + 36 col* generics in MatrixGenerics.
# All of them have the 'useNames' arg, except rowAvgsPerColSet() and
# colAvgsPerRowSet().
# Also MatrixGenerics does not define generics for the *AnyMissings() functions.
all_matrixStats_symbols <- ls('package:matrixStats')
rowfuns <- setdiff(grep('^row', all_matrixStats_symbols, value=TRUE),
c("rowAvgsPerColSet", "rowAnyMissings"))
colfuns <- setdiff(grep('^col', all_matrixStats_symbols, value=TRUE),
c("colAvgsPerRowSet", "colAnyMissings"))
### Figuring out whether they propagate the names or not:
m <- matrix(11:22, ncol=3, dimnames=list(letters[1:4], LETTERS[1:3]))
rowfun_propagates_the_names <- function(fun, useNames=NA)
{
FUN <- match.fun(fun)
if (fun == "rowCollapse") {
idxs <- sample(ncol(m), nrow(m), replace=TRUE)
res <- FUN(m, idxs=idxs, useNames=useNames)
} else if (fun == "rowOrderStats") {
res <- FUN(m, which=1L, useNames=useNames)
} else {
res <- FUN(m, useNames=useNames)
}
names_on_res <- if (is.matrix(res)) rownames(res) else names(res)
identical(names_on_res, rownames(m))
}
colfun_propagates_the_names <- function(fun, useNames=NA)
{
FUN <- match.fun(fun)
if (fun == "colCollapse") {
idxs <- sample(nrow(m), ncol(m), replace=TRUE)
res <- FUN(m, idxs=idxs, useNames=useNames)
} else if (fun == "colOrderStats") {
res <- FUN(m, which=1L, useNames=useNames)
} else {
res <- FUN(m, useNames=useNames)
}
if (fun %in% c("colQuantiles", "colRanges", "colRanks", "colTabulates")) {
names_on_res <- rownames(res)
} else {
names_on_res <- if (is.matrix(res)) colnames(res) else names(res)
}
identical(names_on_res, colnames(m))
}
yes_or_no <- sapply(rowfuns, rowfun_propagates_the_names)
table(yes_or_no)
# FALSE TRUE
# 25 10
### row* functions that propagate the names in matrixStats 0.63.0:
rowfuns[yes_or_no]
# [1] "rowIQRDiffs" "rowLogSumExps" "rowMadDiffs" "rowQuantiles"
# [5] "rowSdDiffs" "rowVarDiffs" "rowWeightedMads" "rowWeightedMeans"
# [9] "rowWeightedSds" "rowWeightedVars"
yes_or_no <- sapply(colfuns, colfun_propagates_the_names)
table(yes_or_no)
# FALSE TRUE
# 25 10
### col* functions that propagate the names in matrixStats 0.63.0:
colfuns[yes_or_no]
# [1] "colIQRDiffs" "colLogSumExps" "colMadDiffs" "colQuantiles"
# [5] "colSdDiffs" "colVarDiffs" "colWeightedMads" "colWeightedMeans"
# [9] "colWeightedSds" "colWeightedVars"
@PeteHaitch @const-ae If you guys want to take a look at this before I merge: #32 This is a PR against the RELEASE_3_17
branch.
I'll start working on a PR against the devel
branch today. If you're ok with this, my plan is to just switch the useNames
default used in the argument lists of the generics from NA
to TRUE
, like in matrixStats. These changes will actually be simpler than the changes in release. However, unlike the latter, they won't be backward compatible.
And here is the proposed changes for the devel version: #33
MatrixGenerics (along with DelayedMatrixStats and sparseMatrixStats) has now been updated in both the release (3.17) and devel (3.18) branches of Bioconductor to handle this change.
matrixStats
v1.0.0 was just published on CRAN (https://github.com/HenrikBengtsson/matrixStats/issues/232). The main change is that the old defaultuseNames = NA
is now deprecated and the new defaultuseNames = TRUE
.I wonder if we should patch the current release version to use
useNames = FALSE
because that avoids deprecation warnings and won't change the behavior all of a sudden. And then with the next Bioconductor release change the default touseNames = TRUE