HenrikBengtsson / matrixStats

R package: Methods that Apply to Rows and Columns of Matrices (and to Vectors)
https://cran.r-project.org/package=matrixStats
202 stars 33 forks source link

Is there a space in matrixStats for rowScale, colScale ? #255

Open karoliskoncevicius opened 3 months ago

karoliskoncevicius commented 3 months ago

This issue is a question whether rowScale and colScale belongs in matrixStats.

In base there is scale(), however it is not convenient:

  1. it works on columns, so requires transposing to work on rows.
  2. it is quite slow.
X <- matrix(rnorm(100000000), ncol=1000)

system.time(scale(X))
  user  system elapsed                                                                                                                                                                                                               
17.748   2.032  19.825

The way I do scaling is now often like this:

system.time((X - colMeans(X)[col(X)]) / matrixStats::colSds(X)[col(X)])
 user  system elapsed                                                                                                                                                                                                               
5.057   0.878   5.949

But that is awkward to write, and still not efficient (means are calculated for both colMeans and colSds, col(X) is there twice).