Bioconductor / MatrixGenerics

S4 Generic Summary Statistic Functions that Operate on Matrix-Like Objects
https://bioconductor.org/packages/MatrixGenerics
12 stars 1 forks source link

Extend rowWeightedVars() to handle a matrix of weights #37

Closed GabrielHoffman closed 9 months ago

GabrielHoffman commented 9 months ago

rowWeightedVars(X,w) currently handles a matrix X, the weights w must be a vector. Is it possible to extend this function handle a weights matrix W the same dimension as X?

I have an application in single cell genomics where X and W are both DelayedArray objects.

I posed a similar question under matrixStats, but I think your package is the right place for it

Best, Gabriel

GabrielHoffman commented 9 months ago

I wrote some code the address this:

# Compared weighted variance along *rows* of X
# where each element of X has its own positive weight
rowWeightedVarsMatrix = function(X, W){

    stopifnot(dim(X) == dim(W))

    if( ! is.matrix(X) ) X <- as.matrix(X)
    if( ! is.matrix(W) ) W <- as.matrix(W)

    # scale weights to have a mean of 1
    Ws <- W / rowMeans2(W, useNames=FALSE)

    # weighted mean
    mu <- rowMeans2(X * Ws)

    # weighted deviation
    A <- sqrt(Ws)*(X - mu)

    # sum of squares, divided by n-1
    rowSums2(A ** 2, useNames=FALSE) / (ncol(X)-1)
}

# testing code
library(modi)
library(matrixStats)
set.seed(1)
n = 100
p = 200
X = matrix(rnorm(n*p), n, p)
W = matrix(rgamma(n*p, 14, 1), n, p)

res1 <- sapply(seq(n), function(i) weighted.var(X[i,], W[i,]))

res2 <- rowWeightedVarsMatrix(X, W)

# compare difference
range(res1 - res2)
PeteHaitch commented 9 months ago

Hi Gabriel,

Some background: MatrixGenerics was developed to provide an S4-based set of generics that mimics the matrixStats API. MatrixGenerics further provides some 'basic' methods (i.e.., those for matrix objects) but it does so simply by calling the corresponding matrixStats function. Other specialised methods (e.g., for DelayedMatrix and dgCMatrix objects) are provided in add-on packages (e.g., DelayedArray, DelayedMatrixStats, sparseMatrixStats).

So to address your feature request, I think you would need to first convince Henrik that your proposed extension to rowWeightedVars() (and presumably also colWeightedVars()?) would make a worthwhile addition to matrixStats. Then, we could mimic that in MatrixGenerics by making any necessary changes to the generic (and documentation), defer to the corresponding matrixStats method for matrix objects, and notify the authors of add-on packages that they may want to consider a corresponding specialised method in their package.

Does that make sense? Thanks for your suggestion and I hope that my response doesn't seem discouraging.

Cheers, Pete

GabrielHoffman commented 9 months ago

Thanks for the detailed reply. I'd always wondered about MatrixGenerics vs matrixStats, so it's good to have a better understanding.

The code I provided above is vectorized pretty well and uses matrixStats already, so I'm set for my purposes.

Best, Gabriel