HenrikBengtsson / matrixStats

R package: Methods that Apply to Rows and Columns of Matrices (and to Vectors)
https://cran.r-project.org/package=matrixStats
202 stars 33 forks source link

Unnecesary checking for `NA` indicies eliminated #217

Closed yaccos closed 2 years ago

yaccos commented 2 years ago

In this pull request, I have generalized Pull Request #216 and made the optimization for the remaining functions in the package. Instead of adding if-statements throughout the package, I have modified the macros R_INDEX_OP and R_INDEX_GET by adding flags which tell whether checking for NA indicies should be conducted. These flags are added in the lowlevel functions as rowsHasNA and colsHasNA and are constant throughout the execution of the function. Hence, the compiler will (hopefully) optimize out checking for NA indices before iterating, if possible. These flags are set by validateIndiciesCheckNA() at the time of index validation. In the earlier versions of the package, the index validation framework did check if any indices were NA, but the result was thrown away.

This approch eliminates checking for NA indicies both when no indicies are supplied (as the situation was in matrixStats 0.60.0 and earlier) and also when the user-specified indicies do not have any NA values (which is a new feature). Still, there is some NA checking for indices which does not use R_INDEX_OP nor R_INDEX_GET (for instance in the naming framework), but I do not consider it critical for real-world performance.

HenrikBengtsson commented 2 years ago

Thank you so much. I'll try to get to this as soon as possible.

HenrikBengtsson commented 2 years ago

UPDATE: I've started revdep checks for 392 packages toward this PR. Hopefully, it goes well.

yaccos commented 2 years ago

UPDATE: I've started revdep checks for 392 packages toward this PR. Hopefully, it goes well.

It should do because the change will not provide any user-visible differences, it doesn't even touch any R code. Better safe than sorry though.

HenrikBengtsson commented 2 years ago

UPDATE: I've started revdep checks for 392 packages toward this PR. Hopefully, it goes well.

It should do because the change will not provide any user-visible differences, it doesn't even touch any R code. Better safe than sorry though.

Yup. Submitting such a big package to CRAN relying only on "should" is a big risk. Even the smallest change can break something somewhere, which can stir up a bit of chaos on CRAN. And there's also the possibility for our own mistakes. Also, if there's something showing up in the future, it's nice to have a recent revdepcheck baseline to compare to.

... which leads to: revdep checks spotted the following regressions:

Running those tests manually revealed the following:

https://github.com/HenrikBengtsson/matrixStats/blob/74ab1bfe4038693f8b42b576dfd82dc01598c7ea/src/rowMads.c#L27-L28

and

https://github.com/HenrikBengtsson/matrixStats/blob/74ab1bfe4038693f8b42b576dfd82dc01598c7ea/src/rowMads.c#L47-L48

I'll fix.

yaccos commented 2 years ago

Thank you. I always find it embarassing to forget debugging print statements left in the code. By the way, did you actually benchmark the fix?

HenrikBengtsson commented 2 years ago

No worries. No I didn't benchmark. Do you think you can benchmark some of them?