Requesting versions of colMeans and colMedians allowing unique masks/weights per column

I've had the need to compute the means and medians of subsets of matrix columns, with a different subset per column; this was for the purpose of computing many random split-half reliability values quickly. I've made Rcpp functions that do this here but I figure it might be something that would fit matrixStats better, especially since the Rcpp implementations are still slow.

colMeansMasked(x=cbind(1:5,5:9),mask=cbind(c(T,F,F,F,F),T))

which should give: 1 7

My example features logical input or 0's and 1's but integer weights would behave the same and also be more flexible.

The way this differs from the present colWeightedMeans() is that the weights can also be a matrix and hence differ from column to column.

A related application was to compute 1000s of means of the same vector but with different weights or subsets applied.

example:

meansByWeight(x=1:5,weights=cbind(rep(1,5),c(1,1,1,0,0),5:1))

Which should give: 3.000000 2.000000 2.333333

I've had the need to compute the means and medians of subsets of matrix columns, with a different subset per column; this was for the purpose of computing many random split-half reliability values quickly. I've made Rcpp functions that do this here but I figure it might be something that would fit matrixStats better, especially since the Rcpp implementations are still slow.

colMeansMasked(x=cbind(1:5,5:9),mask=cbind(c(T,F,F,F,F),T))

which should give: 1 7

My example features logical input or 0's and 1's but integer weights would behave the same and also be more flexible.

Thanks.

FWIW, one can achieve something similar with matrixStats very efficiently using:

colMeansMasked <- function(x, mask, ...) {
  vapply(seq_len(ncol(x)), FUN.VALUE = NA_real_, FUN = function(kk) {
    matrixStats::colMeans2(x, cols = kk, rows = mask[, kk])
  })
}

x <- cbind(1:5, 5:9)
mask <- cbind(c(TRUE, FALSE, FALSE, FALSE, FALSE), TRUE)
y <- colMeansMasked(x, mask = mask)
y
[1] 1 7

It's obviously quite verbose, but still very efficient, e.g. there are no unnecessary memory allocations taking place. The exception is the subsetting of the mask matrix, but it could be that R is clever enough these days to avoid that copy here (I'm not sure).

FWIW, there's been at least one other feature request to support weight matrices, cf. https://github.com/HenrikBengtsson/matrixStats/issues/245. What I can say right now is that we are working toward a 100% stable version of the current API as a major milestone. This will be published. After that, we will look into expanding the API and adding more features.

HenrikBengtsson / matrixStats

Requesting versions of colMeans and colMedians allowing unique masks/weights per column #263