Example of faster rowVar

ChristophH commented 4 years ago

Hi Constantin, super useful package - would have needed this 2 years ago ;)

This isn't really an issue, just sharing my observations. A while back I ended up implementing some of the row* functions myself for sctransform and it turns out that sctransform:::row_var is about 40% faster than sparseMatrixStats::rowVars in my test.

I only tested one dataset (36601 x 10194 sparse dgCMatrix from 10x Genomics

bench::mark(
  sparseMatrixStats=sparseMatrixStats::rowVars(sc_data),
  sct=sctransform:::row_var(sc_data),
  check = FALSE,
  min_iterations = 19
)

# A tibble: 2 x 13
  expression          min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory time 
  <bch:expr>        <bch> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list> <lis>
1 sparseMatrixStats 333ms  345ms      2.84     574KB        0    19     0      6.68s <NULL> <Rpro… <bch…
2 sct               240ms  253ms      3.97    1003KB        0    19     0      4.79s <NULL> <Rpro… <bch…
# … with 1 more variable: gc <list>

Maybe my version if faster because I don't transpose the matrix, but have a 'native' row-wise implementation. Edit: note that I cannot handle NAs.

Cheers, C

PS my implementation: R code C code

const-ae commented 4 years ago

Hi Christoph,

thanks for reaching out and providing me with this challenge :D

I found the source why sparseMatrixStats was slower.

bench::mark(
  sparseMatrixStats = sparseMatrixStats::rowVars(mat),
  sct = sctransform:::row_var(mat),
  check = TRUE,
  min_iterations = 19
)
#> # A tibble: 2 x 6
#>   expression             min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>        <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 sparseMatrixStats    130ms    136ms      7.38    12.8MB        0
#> 2 sct                  185ms    190ms      5.28    12.6MB        0

^{Created on 2020-09-22 by the reprex package (v0.3.0)}

In version 1.1.2 (https://github.com/const-ae/sparseMatrixStats/commit/9bd4d7def7f1fa3a971fd7940a030272be916696) I had already implemented a version of rowVars() that doesn't transpose the input. However, the problem was that in the hot inner-loop I repeatedly compared val_iter != values.end(), because I assumed that clang would optimize away the repeated method call. Well turns out it does not. In https://github.com/const-ae/sparseMatrixStats/commit/92f13460ed363fe8994658f746f49406142d7d1b I fixed the problem and explictly cache values.end() which gives a sizeable performance boost :)

ChristophH commented 4 years ago

Glad I could be of service :)

Edit: More than two months later I realize I had the same performance issue in my code 🤦

const-ae / sparseMatrixStats

Example of faster rowVar #5