HenrikBengtsson / matrixStats

R package: Methods that Apply to Rows and Columns of Matrices (and to Vectors)
https://cran.r-project.org/package=matrixStats
203 stars 33 forks source link

rowMaxs() et al. can be improved #200

Open HenrikBengtsson opened 3 years ago

HenrikBengtsson commented 3 years ago

The underlying implementation for rowRanges(), rowMins(), and rowMaxs() can probably be improved. Looking at the benchmarks compared to Rfast it certainly looks like so.

> X <- matrix(rnorm(1000*1000), nrow=1000, ncol=1000)

> byrow <- microbenchmark::microbenchmark(matrixStats = matrixStats::rowMaxs(m), Rfast = Rfast::rowMaxs(m, value=TRUE), apply = apply(m, MARGIN=1, FUN=max))
> byrow
Unit: microseconds
        expr       min         lq      mean     median        uq       max neval
 matrixStats  1957.475  2089.3485  2304.821  2228.9490  2395.270  4203.469   100
       Rfast   693.778   866.4185  1032.467   935.8735  1045.914  2446.249   100
       apply 12781.334 18051.5830 21182.688 20964.8795 23095.604 70352.344   100

Note that the colNnn() implementation is already optimized;

> bycol <- microbenchmark::microbenchmark(matrixStats = matrixStats::colMaxs(m), Rfast = Rfast::colMaxs(m, value=TRUE), apply = apply(m, MARGIN=2, FUN=max))

> bycol
Unit: milliseconds
        expr      min       lq      mean    median        uq       max neval
 matrixStats 1.204109 1.366232  1.503899  1.440197  1.572250  2.750286   100
       Rfast 1.287229 1.390556  1.553244  1.491345  1.631343  2.800478   100
       apply 8.864002 9.924433 12.371923 12.143643 13.839154 24.526761   100

The reason for the row versions not being as fast is most likely because of how the implementation attempts to re-use the same code/macro-base for both rows and columns and this doesn't work all the way, e.g. see https://github.com/HenrikBengtsson/matrixStats/blob/483be545e75e4a0b5e9bb97691d04179a206bbb8/src/rowRanges_lowlevel_template.h#L100-L130