HenrikBengtsson / matrixStats

R package: Methods that Apply to Rows and Columns of Matrices (and to Vectors)
https://cran.r-project.org/package=matrixStats
202 stars 33 forks source link

TASK: Investigate options for performance degradation since matrixStats 0.60.1 [2021-08-22] #220

Closed HenrikBengtsson closed 1 year ago

HenrikBengtsson commented 2 years ago

Background

In matrixStats 0.59.0 [2021-05-31] names attributes were not handled. in matrixStats 0.60.0 [2021-07-26], names attributes were handled and the new argument useNames was added. The handling of names attributes were done at the R level. In matrixStats 0.60.1 [2021-08-22], the handling of names attributes was implemented in the C level.

Now, in order to simplify the implementation of names attributes in C, other changes were made to matrixStats 0.60.1, which affected the performance. Specifically, the complex setup of macro at the C level previously used, was replaced with a run-time approach. To do this, the code was rewritten so that index ('idxs', 'rows', and 'cols') arguments of type integer (= 'R_len_t') are always coerced to long integers (= 'R_xlen_t'). This coercing introduces new overhead in form of processing time and having to allocate and deallocate extra memory. Basic benchmarking (Issue #212) shows that this impact is greater than (at least I) anticipated.

Question

Can we avoid the significant overhead introduced in matrixStats 0.60.1? If not, is it worth reverting the changes done at the C level and go back to handling names attributes at the R level (as in matrixStats 0.60.0)?

To have a better picture of our options, I think we should do thorough benchmarking of the following three version:

  1. matrixStats 0.59.0 [2021-05-31] - no names attributes
  2. matrixStats 0.60.0 [2021-07-26] - names attributes implement at the R level
  3. matrixStats 0.60.1 [2021-08-22] - names attributes implement at the C level
  4. matrixStats 0.62.9900 (branch pr127) - names attributes implement at the C level with performance tweaks

If the performance of matrixStats 0.60.0 is better than matrixStats 0.60.1 and matrixStats 0.62.9900, we might want to roll back to handling names attribute at the R level. If not, then we ... oh well, might have to re-implement everything using the macro approach.

yaccos commented 2 years ago

Some clearifications are necessary. I suggest we schedule a meeting for this issue.

HenrikBengtsson commented 1 year ago

Closing. @yaccos PR#229 made a big difference. Let's re-open if additional degradation observations are reported.