In matrixStats 0.59.0 [2021-05-31] names attributes were not handled. in matrixStats 0.60.0 [2021-07-26], names attributes were handled and the new argument useNames was added. The handling of names attributes were done at the R level. In matrixStats 0.60.1 [2021-08-22], the handling of names attributes was implemented in the C level.
Now, in order to simplify the implementation of names attributes in C, other changes were made to matrixStats 0.60.1, which affected the performance. Specifically, the complex setup of macro at the C level previously used, was replaced with a run-time approach. To do this, the code was rewritten so that index ('idxs', 'rows', and 'cols') arguments of type integer (= 'R_len_t') are always coerced to long integers (= 'R_xlen_t'). This coercing introduces new overhead in form of processing time and having to allocate and deallocate extra memory. Basic benchmarking (Issue #212) shows that this impact is greater than (at least I) anticipated.
Question
Can we avoid the significant overhead introduced in matrixStats 0.60.1? If not, is it worth reverting the changes done at the C level and go back to handling names attributes at the R level (as in matrixStats 0.60.0)?
To have a better picture of our options, I think we should do thorough benchmarking of the following three version:
matrixStats 0.59.0 [2021-05-31] - no names attributes
matrixStats 0.60.0 [2021-07-26] - names attributes implement at the R level
matrixStats 0.60.1 [2021-08-22] - names attributes implement at the C level
matrixStats 0.62.9900 (branch pr127) - names attributes implement at the C level with performance tweaks
If the performance of matrixStats 0.60.0 is better than matrixStats 0.60.1 and matrixStats 0.62.9900, we might want to roll back to handling names attribute at the R level. If not, then we ... oh well, might have to re-implement everything using the macro approach.
Background
In matrixStats 0.59.0 [2021-05-31] names attributes were not handled. in matrixStats 0.60.0 [2021-07-26], names attributes were handled and the new argument
useNames
was added. The handling of names attributes were done at the R level. In matrixStats 0.60.1 [2021-08-22], the handling of names attributes was implemented in the C level.Now, in order to simplify the implementation of names attributes in C, other changes were made to matrixStats 0.60.1, which affected the performance. Specifically, the complex setup of macro at the C level previously used, was replaced with a run-time approach. To do this, the code was rewritten so that index ('idxs', 'rows', and 'cols') arguments of type integer (= 'R_len_t') are always coerced to long integers (= 'R_xlen_t'). This coercing introduces new overhead in form of processing time and having to allocate and deallocate extra memory. Basic benchmarking (Issue #212) shows that this impact is greater than (at least I) anticipated.
Question
Can we avoid the significant overhead introduced in matrixStats 0.60.1? If not, is it worth reverting the changes done at the C level and go back to handling names attributes at the R level (as in matrixStats 0.60.0)?
To have a better picture of our options, I think we should do thorough benchmarking of the following three version:
If the performance of matrixStats 0.60.0 is better than matrixStats 0.60.1 and matrixStats 0.62.9900, we might want to roll back to handling names attribute at the R level. If not, then we ... oh well, might have to re-implement everything using the macro approach.