Open pneuvial opened 5 years ago
Good point. Hmm... could be a mistake or it could be that rank()
has been updated since rowRanks()
was introduced.
Nah... the reason is probably that it was only ties.method = "max"
that was supported in the early days:
Version: 0.7.0 [2013-01-14]
NEW FEATURES:
Version: 0.6.4 [2013-01-13]
NEW FEATURES:
Version: 0.4.0 [2011-11-11]
[...]
NEW FEATURES:
In any case I'm glad that ties.method = "average"
is also supported: it's the one used in wilcox.test()
, so I've been able to make a rowWilcoxonTests()
function using rowRanks(ties.method = "average")
.
Cool. Is your rowWilcoxonTests()
available online?
Subscribed to notifications of matrixStats and found this. Just wanted to add that I also used rowRanks
a few times when implementing very similar things - row_kruskalwallis()
So this is mainly a praise of matrixStats
- it is extremely useful for speeding up multiple statistical tests on genomic-like datasets.
And by coincidence "row_wilcoxon" is next on the list of things that will get included in that package.
Interesting! I did not know about the matrixTests package, it's very nice!
My rowWilcoxonTests() is available in the sansSouci package: https://github.com/pneuvial/sanssouci/blob/develop/R/rowWilcoxonTests.R
Of course it would make more sense to have this or a version of it in a package specifically dedicated to tests on the rows of a matrix!
NB: as noted in the Details of the man page:
The p-values are computed using the normal approximation as described in the \code{\link{wilcox.test}} function. The exact p-values (which can be useful for small samples with no ties) are not implemented (yet).
FYI Karolis: I also have rowWelchTests()
and rowBinomialTests()
in the same package.
Thank you a lot for the sharing the reference to your package Pierre! Will definitely take a look.
From time to time I search for packages that implement similar things and everything I find I add to the references in the README under "See Also" section. Added yours now too. I probably missed it because it wasn't on CRAN nor BioConductor.
Also think yours is the first time I see Wilcoxon Test implemented in such a way. Most others typically implement fast versions of t-tests and are done.
The p-values are computed using the normal approximation as described in the \code{\link{wilcox.test}} function. The exact p-values (which can be useful for small samples with no ties) are not implemented (yet).
Yes, this is the main thing that held me back from implementing Wilcoxon test in matrixTests. I try to maintain high fidelity with tests that are already in R, so all these approximations and when they are used sometimes takes time to get "right". Especially when they depend on sample size and there are different number of missing NA
values in different rows.
I submitted a PR that addresses this and adds "first", "last", "random", and "dense" ties.methods.
Thank again for adding this (PR #146).
We are going to need several release cycles to change the default for argument ties.methods
without (silently) messing up the results for users/pipelines that rely on rowRanks()
. This will probably involve something like:
Release 1: If missing(ties.method) == TRUE
, then produce an informative warning saying this argument needs to be set to ties.method = "max"
to avoid the warning. Here we could possibly hijack .Deprecated()
warnings, which with some luck will be reported by R CMD check
(some updates have been made toward this direction).
Release 2: If missing(ties.method) == TRUE
, turn above warning into an error. Here we can also change the default to be ties.method = "average"
.
Release 3: Allow for missing(ties.method) == TRUE
again.
To speed up the above, I could run reverse dependency checks with missing(ties.method) == TRUE
producing an error to catch obvious candidates and report to their maintainers.
I've scanned all 347 reverse dependencies on CRAN and Bioconductor 3.13 (R 4.1.0) for those that use colRanks()
or rowRanks()
and I identified the following packages:
rowRanks()
: [6] EventPointer, fishpond, MatrixGenerics, matrixTests, nparMD, STROMA4colRanks()
: [7] clustifyr, HDSpatialScan, MatrixGenerics, MetaNeighbor, reconsi, singscore, WGCNAThus, as of 2021-06-22, there are 12 packages in total that rely on these matrixStats functions:
Bioconductor:
ties.method
)R_MATRIXSTATS_TIES_METHOD_MISSING=defunct
=> 'OK'ties.method
)R_MATRIXSTATS_TIES_METHOD_MISSING=defunct
=> 'OK'ties.method
)R_MATRIXSTATS_TIES_METHOD_MISSING=defunct
=> R CMD check [ERROR](https://github.com/HenrikBengtsson/matrixStats/blob/feature/default-rank-ties.method/revdepR_MATRIXSTATS_TIES_METHOD_MISSING=defunct
=> R CMD check ERRORR_MATRIXSTATS_TIES_METHOD_MISSING=defunct
=> 'OK'ties.method
)R_MATRIXSTATS_TIES_METHOD_MISSING=defunct
=> 'OK'ties.method
)R_MATRIXSTATS_TIES_METHOD_MISSING=defunct
=> ??? (there are other ERRORs)R/random.ranks.R: matrixStats::rowRanks(datrank.rand.col)
ties.method
),R_MATRIXSTATS_TIES_METHOD_MISSING=defunct
=> R CMD check ERRORties.method
),R_MATRIXSTATS_TIES_METHOD_MISSING=defunct
=> R CMD check ERRORties.method
),R_MATRIXSTATS_TIES_METHOD_MISSING=defunct
=> R CMD check ERRORCRAN:
ties.method
)R_MATRIXSTATS_TIES_METHOD_MISSING=defunct
=> 'OK'ties.method
)R_MATRIXSTATS_TIES_METHOD_MISSING=defunct
=> 'OK'ties.method
)R_MATRIXSTATS_TIES_METHOD_MISSING=defunct
=> 'OK'ties.method
)R_MATRIXSTATS_TIES_METHOD_MISSING=defunct
=> 'OK'These are the packages that we must make sure don't make assumptions about the default ties.method
value.
UPDATE: I guess my "scan" script produces false negatives, because I'd expect DelayMatrixStats and sparseMatrixStats to also show up above, but they don't.
UPDATE 2: It might be that DelayMatrixStats and sparseMatrixStats only imports from matrixStats in their package tests, which I did not scan.
I ran another scan for packages not specifying the ties.method
argument and found the following.
CRAN:
Bioconductor:
This won't become defunct anytime soon, but I will make it deprecated (i.e. produce warnings) starting with R (>= 4.4.0) [2024-04-24] and therefore also Bioconductor (>= 3.19) [2024-05-01].
This won't become defunct anytime soon, but I will make it deprecated (i.e. produce warnings) starting with R (>= 4.4.0) [2024-04-24] and therefore also Bioconductor (>= 3.19) [2024-05-01].
This is implemented in matrixStats 1.3.0, which is now on CRAN.
By default
rank()
uses the 'average' method to break ties, whilerowRanks()
uses the 'max' method:Shouldn't the default argument be the same in both functions?