Closed yaccos closed 3 years ago
Good points. To clarify for others, and to make sure I got this correct, in matrixStats (<= 0.60.0), we had:
whereas in matrixStats (>= 0.60.1), we now have:
In other words, before the code did not check for missing values when not subsetting by rows
or cols
, whereas now, R_INDEX_OP
and R_INDEX_GET
always checks for missing values.
So, yes, I think this explains the performance degradation going from matrixStats 0.60.0 to 0.60.1.
Sorry for my sloppy coding style, I have now taken care of that.
Thanks. Since this type of code has to be added to a lot of functions, I got a feeling that we might end up introducing a generic C macro for this, maybe something where one passes norows
and nocols
to the macro. Let's see what happens.
I also imagine that we can check user-supplied indicies for NA-values in validateIndicies()
and generalize the solution to have flags such as rowsHaveNA
and colsHaveNA
. In this case rows == NULL
should result in rowsHaveNA == 0
.
From the definitions of
R_INDEX_OP
andR_INDEX_GET
, I see that these macros check both arguments for NA-values. In the cases we know that none of the operands are NA, we can avoid wasting CPU cycles on this checking. In this case, I have modifiedrowSums2()
to skip checking for index NA values when neither row nor col subsets are provided. From my own tests, I have registered a performance boost and would like you to try it out as well. If implemented consistently, we could probably mitigate most of issue #212.