const-ae / sparseMatrixStats

Implementation of the matrixStats API for sparse matrices
Other
54 stars 3 forks source link

HEADS UP: New default for `ties.method` of {col,row}Ranks() #20

Open HenrikBengtsson opened 3 years ago

HenrikBengtsson commented 3 years ago

Background

matrixStats uses ties.method = "max" as the default for colRanks() and rowRanks() for legacy reasons, but we want eventually update to ties.method = "average" to align it with base::rank(), cf. https://github.com/HenrikBengtsson/matrixStats/issues/142.

The process for this migration with be:

  1. Give a deprecation warning if ties.method is not explicitly specified (long time; several releases)
  2. Give a defunct error if ties.method is not explicitly specified (long time; several releases)
  3. Switch the new default to ties.method = "average"

This will have to take a long time in order to make sure end-users out there will notice this and update their code. I hope this will minimize the risk for existing code all of a sudden start producing different results.

Issue

sparseMatrixStats gives an ERROR when I revdep check asserting !isTRUE(missing(ties.method)), cf. https://github.com/HenrikBengtsson/matrixStats/blob/feature/default-rank-ties.method/revdep/R_MATRIXSTATS_TIES_METHOD_MISSING%3Ddefunct/problems.md#sparsematrixstats

const-ae commented 3 years ago

Hi Henrik, thanks for the heads-up.

Could you say a bit more what is the advantage of using ties.method = "average" as the default and from now on forcing people to explicitly specify ties.method? I often use the ranks function without explicitly setting ties.method, because I don't really care what happens with the ties or know that I don't have any. And only if I realize that the ties are important, I specifically choose theties.method that matches my needs.

And I assume, that I will also need to duplicate the efforts to warn people about not specifying ties.method to stay aligned with matrixStats and avoid suddenly breaking code when you do make the switch in step 3, is that correct?

HenrikBengtsson commented 3 years ago

Could you say a bit more what is the advantage of using ties.method = "average" ...

Only to align with base::rank(). The existing discrepancy in matrixStats may result in incorrect results due to assumptions about the default.

... from now on forcing people to explicitly specify ties.method?

Only until we get to Step 3, when ties.method = "average" (= formals(base::rank)$ties.method) becomes the new default. After that, we'll drop the requirement of having to specify ties.method.

A more abrupt approach would be to just switch the default, but I'm too conservative for such a silent change.

And I assume, that I will also need to duplicate the efforts to warn people about not specifying ties.method to stay aligned with matrixStats and avoid suddenly breaking code when you do make the switch in step 3, is that correct?

Yes, I think so. You could keep your current default, if you'd like, or take the same deprecate-defunct-change_default approach as I do above.

Note that this is an extremely rare event, but I think it's worth fixing it, especially for future generations. FWIW, in hindsight, we should have made ties.method = "average" the default when we started out many years ago and just have given an error "ties.method = "average' is not yet supported". Then we wouldn't have to do this complicated update process.

Clear as mud?