Closed hpages closed 3 years ago
If nobody objects, I'll merge early next week. Thanks!
Skimming over the code it looks good (I'll try to take a deeper look over the weekend, but can't promise anything). Don't forget to add yourself to the authors list :)
ok, let's proceed
Hmm, it's not quite working yet. Compare these 3 examples; depending on which sparseMatrixStats function is called first it either works or errors and subsequent functions may also be affected.
colMads()
errors but colMedians()
workssuppressPackageStartupMessages(library(MatrixGenerics))
library(Matrix)
set.seed(666)
x <- matrix(rpois(100, 3), ncol = 4)
X <- as(x, "dgCMatrix")
colMads(x) # Works
#> [1] 1.4826 1.4826 2.9652 1.4826
colMads(X) # Errors
#> Error in MatrixGenerics:::.load_next_suggested_package_to_search(x): Failed to find a colMads() method for dgCMatrix objects.
# Other functions work
colMedians(X)
#> [1] 3 3 3 3
colMaxs(X)
#> [1] 7 8 6 7
# But first function still fails
colMads(X)
#> Error in MatrixGenerics:::.load_next_suggested_package_to_search(x): Failed to find a colMads() method for dgCMatrix objects.
Created on 2020-10-13 by the reprex package (v0.3.0)
colMedians()
errors but colMads()
workssuppressPackageStartupMessages(library(MatrixGenerics))
library(Matrix)
set.seed(666)
x <- matrix(rpois(100, 3), ncol = 4)
X <- as(x, "dgCMatrix")
colMedians(x) # Works
#> [1] 3 3 3 3
colMedians(X) # Errors
#> Error in MatrixGenerics:::.load_next_suggested_package_to_search(x): Failed to find a colMedians() method for dgCMatrix objects.
# Other functions work
colMads(X)
#> [1] 1.4826 1.4826 2.9652 1.4826
colMaxs(X)
#> [1] 7 8 6 7
# But first function still fails
colMedians(X)
#> Error in MatrixGenerics:::.load_next_suggested_package_to_search(x): Failed to find a colMedians() method for dgCMatrix objects.
Created on 2020-10-13 by the reprex package (v0.3.0)
colMedians()
and colMads()
both work (everything works?)suppressPackageStartupMessages(library(MatrixGenerics))
library(Matrix)
set.seed(666)
x <- matrix(rpois(100, 3), ncol = 4)
X <- as(x, "dgCMatrix")
colMaxs(x) # Works
#> [1] 7 8 6 7
colMaxs(X) # Works
#> [1] 7 8 6 7
# Other functions work
colMads(X)
#> [1] 1.4826 1.4826 2.9652 1.4826
colMedians(X)
#> [1] 3 3 3 3
Created on 2020-10-13 by the reprex package (v0.3.0)
Oh no!
Method dispatch uses a cache mechanism to speedup method lookup. The 1st time colMads()
was called the dispatch mechanism didn't find a method for dgCMatrix objects (sparseMatrixStats was not loaded yet) so it used the ANY
method instead, and the association between dgCMatrix objects and the ANY method got stored in the cache. So from now on, no matter what, calling colMads()
on a dgCMatrix will always go for the ANY
method, even if sparseMatrixStats got loaded and there is now a dgCMatrix method in the method table.
Step by step:
suppressPackageStartupMessages(library(MatrixGenerics))
library(Matrix)
set.seed(666)
x <- matrix(rpois(100, 3), ncol = 4)
X <- as(x, "dgCMatrix")
This selects the ANY method, as expected:
selectMethod("colMads", "dgCMatrix")
# Method Definition:
#
# function (x, rows = NULL, cols = NULL, center = NULL, constant = 1.4826,
# na.rm = FALSE, ...)
# {
# MatrixGenerics:::.load_next_suggested_package_to_search(x)
# callGeneric()
# }
# <bytecode: 0x469af48>
#
# Signatures:
# x
# target "dgCMatrix"
# defined "ANY"
showMethods("colMads")
# Function: colMads (package MatrixGenerics)
# x="ANY"
# x="matrix_OR_array_OR_table_OR_numeric"
After calling colMads(X)
, selectMethod()
finds the dgCMatrix method:
colMads(X) # Errors
selectMethod("colMads", "dgCMatrix")
# Method Definition:
#
# function (x, rows = NULL, cols = NULL, center = NULL, constant = 1.4826,
# na.rm = FALSE, ...)
# {
# .local <- function (x, rows = NULL, cols = NULL, constant = 1.4826,
# na.rm = FALSE)
# {
# if (!is.null(rows)) {
# x <- x[rows, , drop = FALSE]
# }
# if (!is.null(cols)) {
# x <- x[, cols, drop = FALSE]
# }
# dgCMatrix_colMads(x, na_rm = na.rm, scale_factor = constant)
# }
# .local(x, rows, cols, constant, na.rm, ...)
# }
# <bytecode: 0x8896e30>
# <environment: namespace:sparseMatrixStats>
#
# Signatures:
# x
# target "dgCMatrix"
# defined "dgCMatrix"
But showMethods()
fails to see it (like method dispatch):
showMethods("colMads")
# Function: colMads (package MatrixGenerics)
# x="ANY"
# x="dgCMatrix"
# (inherited from: x="ANY")
# x="matrix_OR_array_OR_table_OR_numeric"
Pretty nasty!
What's even more crazy is that trying to debug actually fixes the problem:
debug(colMads, signature="dgCMatrix")
# Tracing specified method for function "colMads" in environment
# <namespace:MatrixGenerics>
colMads(X)
# Tracing function ".local" in package "sparseMatrixStats"
# Tracing .local(x, rows, cols, constant, na.rm, ...) step 2
# Called from: eval(expr, p)
# Browse[1]>
# debug: if (!is.null(rows)) {
# x <- x[rows, , drop = FALSE]
# }
# Browse[2]>
# debug: if (!is.null(cols)) {
# x <- x[, cols, drop = FALSE]
# }
# Browse[2]>
# debug: dgCMatrix_colMads(x, na_rm = na.rm, scale_factor = constant)
# Browse[2]>
# [1] 1.4826 1.4826 2.9652 1.4826
undebug(colMads, signature="dgCMatrix")
# Untracing specified method for function "colMads" in environment
# <namespace:MatrixGenerics>
Now showMethods()
sees the dgCMatrix method:
showMethods("colMads")
Function: colMads (package MatrixGenerics)
x="ANY"
x="dgCMatrix"
x="matrix_OR_array_OR_table_OR_numeric"
## And from now on, colMads(X) works:
colMads(X)
# [1] 1.4826 1.4826 2.9652 1.4826
So calling colMads(X)
inside a debug session somehow seems to flush the cache.
I'll try to come up with a fix. It's probably not gonna be pretty!
Thanks Pete!
Done: 7f900765e56d43e7257cf943ca2d21f66f3eb654 It's not too ugly.
I'll try to report the method dispatch bug later this week.
I'm not too keen on your suggestion to include the list of packages to install in the call to BiocManager::install()
displayed in the error message Pete because in most cases people will probably not need to install all the suggested packages, only a few selected ones. For example they wouldn't need to install DelayedMatrixStats if they've never heard of DelayedArray objects and are only dealing with dgCMatrix objects.
H.
Gees, I'm glad I didn't have to deal with that, it's nasty.
I'm not too keen on your suggestion to include the list of packages to install in the call to BiocManager::install() displayed in the error message Pete because in most cases people will probably not need to install all the suggested packages, only a few selected ones. For example they wouldn't need to install DelayedMatrixStats if they've never heard of DelayedArray objects and are only dealing with dgCMatrix objects.
Ah I see. My read of the code was that it would only report the exactly relevant package but I see you're saying it may report a 'false positive' as well as the relevant package?
BTW any idea why colMaxs()
worked immediately?
BTW any idea why colMaxs() worked immediately?
Great observation. That's going to be useful when I try to come up with a useful bug report.
I'm not sure why. The only difference with colMads()
is where the methods are defined:
colMads()
colMaxs()
When colMads()
or colMaxs()
is called, MatrixGenerics:::.load_next_suggested_package_to_search()
tries to load the suggested packages in order until dispatch finds an appropriate method. If the supplied object is a dgCMatrix, loading sparseMatrixStats should be enough. However, before the 7f900765e56d43e7257cf943ca2d21f66f3eb654 fix, calling colMads()
or colMaxs()
was triggering the loading of both packages, sparseMatrixStats and DelayedMatrixStats, because the method dispatch bug would prevent dispatch from finding the method after sparseMatrixStats gets loaded. The difference I see between colMads()
and colMaxs()
though is that for the latter, the loading of DelayedMatrixStats puts the dgCMatrix method back in the output of showMethods("dgCMatrix")
. It does this with colMaxs()
but not with colMads()
. Probably related to the fact that the DelayedMatrixStats package defines a method for the latter but not for the former. Even though I don't have a clear picture of what's going on exactly.
Anyway, this is an heisenbug, the first one I ever have to deal with in R. They are the worst!
On the BiocManager::install()
command that gets displayed when a method is not found:
After giving it a 2nd thought, when the list of packages to install is reduced to one package, it makes sense to insert the name of the package in the call to BiocManager::install()
displayed in the error message. There is no ambiguity about what package to install in that case. So with e988bc31cc114171376396878fa214555f60ad7b we get:
> colMads(X)
Error in MatrixGenerics:::.load_next_suggested_package_to_search(x) :
Failed to find a colMads() method for dgCMatrix objects.
However, the following packages are likely to contain the missing method
but are not installed: sparseMatrixStats, DelayedMatrixStats.
Please install them (with 'BiocManager::install(...)') and try again.
Alternatively, if you know where the missing method is defined, install
only that package.
if sparseMatrixStats and DelayedMatrixStats are not installed, and:
> colMads(X)
Error in MatrixGenerics:::.load_next_suggested_package_to_search(x) :
Failed to find a colMads() method for dgCMatrix objects.
However, the following package is likely to contain the missing method
but is not installed: sparseMatrixStats.
Please install it (with 'BiocManager::install("sparseMatrixStats")')
and try again.
if only sparseMatrixStats is not installed.
huh, looks like MatrixGenerics is getting a little download boost... and that's only the beginning
This extends the fallback mechanism proposed here to all the generic functions defined in MatrixGenerics.