HenrikBengtsson / matrixStats

R package: Methods that Apply to Rows and Columns of Matrices (and to Vectors)
https://cran.r-project.org/package=matrixStats
202 stars 33 forks source link

Restructuring of NA checking code #229

Closed yaccos closed 1 year ago

yaccos commented 1 year ago

This pull request is a spin-off of PR #217 and seeks to resolve issues #212 and #220 once and for all. Compared to PR #217, I have restructured the code to use nested if-statements instead of conditionals while still minimizing unnecessary checks for NA index values. Even though the code does logically the same as in PR #217, the if-statements seem to play nicer with compiler optimizations and from my benchmarks, the performance is on par with version 0.60.0 published before removing the convoluted macro system.

yaccos commented 1 year ago

And for the benchmarks (running under R version, Ubuntu 22.04.2 LTS, and gcc 11.3.0) compiled with -O2:

library(matrixStats)
library(bench)

nr <- 1000
nc <- 20

set.seed(1)
x <- matrix(rnorm(nr*nc), nrow = nr, ncol = nc)
dim(x)

br <- mark(rowSums = rowSums(x),
           rowSums2 = rowSums2(x),
           rowSums2_noNames = rowSums2(x, useNames = F))
br[, 1:7]

# 0.60.0
  expression            min   median `itr/sec` mem_alloc `gc/sec` n_itr
  <bch:expr>       <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int>
1 rowSums            31.2µs   32.7µs    29122.    7.86KB     11.7  9996
2 rowSums2             23µs   24.6µs    36942.   22.49KB     14.8  9996
3 rowSums2_noNames   23.4µs   25.1µs    37839.    8.07KB     11.4  9997

# 0.60.1
  expression            min   median `itr/sec` mem_alloc `gc/sec` n_itr
  <bch:expr>       <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int>
1 rowSums            31.2µs   36.2µs    26825.    7.86KB     2.68  9999
2 rowSums2           38.1µs   43.7µs    21522.   11.71KB     2.15  9999
3 rowSums2_noNames   37.8µs   42.7µs    22650.    7.86KB     2.27  9999

# This PR
expression            min   median `itr/sec` mem_alloc `gc/sec` n_itr
  <bch:expr>       <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int>
1 rowSums            31.3µs   33.3µs    29423.    7.86KB     5.89  9998
2 rowSums2           23.3µs   23.6µs    39340.   16.48KB     3.93  9999
3 rowSums2_noNames   23.3µs   23.6µs    39782.    7.86KB     7.96  9998

# PR 217
  expression            min   median `itr/sec` mem_alloc `gc/sec` n_itr
  <bch:expr>       <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int>
1 rowSums            31.2µs   33.3µs    28778.    7.86KB     5.76  9998
2 rowSums2           45.3µs   46.1µs    20278.   16.48KB     2.03  9999
3 rowSums2_noNames     42µs   46.1µs    21077.    7.86KB     4.22  9998
HenrikBengtsson commented 1 year ago

This is great. Thank you! Merged into the develop branch now.