HenrikBengtsson / matrixStats

R package: Methods that Apply to Rows and Columns of Matrices (and to Vectors)
https://cran.r-project.org/package=matrixStats
203 stars 33 forks source link

Extra argument in binCounts #78

Closed daoud-sie closed 8 years ago

daoud-sie commented 8 years ago

An extra argument is added in binCounts which will break code where no explicit argument names are used in calling the binCounts function. Some of my code assume "bx" to be the second argument. I can fix this in my code, but should we also fix it here?

HenrikBengtsson commented 8 years ago

Hi Daoud, thanks for the report. Your observations are correct; some function calls to matrixStats may break because they assume a certain order of arguments. The solution is to use named arguments. I should have added a note about this to the NEWS file, but too late now.

The background

In matrixStats (>= 0.50.0), we introduced optional arguments idxs, rows, and cols for optimized subsetted matrixStats calculations.

When adding these arguments we debated whether to add them (i) at the very end after all existing arguments, or (ii) after the arguments that should be subsetted by them. We went with the latter intentionally.

For example, with binCounts() we used to have:

> args(binCounts)
function (x, bx, right = FALSE, ...)

but now we have

> args(binCounts)
function (x, idxs = NULL, bx, right = FALSE, ...)

Problem

Thus, if binCounts() was called as:

binCounts(x, bx)

then, yes, it will break, because it effectively becomes:

binCounts(x, idxs=bx)

and not

binCounts(x, bx=bx)

as intended.

Solution

Update the code to use named arguments as far as possible in all matrixStats calls. The exception is the first argument, which does not have to be named. This also applies to the second argument if that specify weights, e.g.

> args(weightedMean)
function (x, w = NULL, idxs = NULL, na.rm = FALSE, refine = FALSE, ...)

What I did to prevent surprises

Before submitting to CRAN, I ran R CMD check --as-cran on all 68 packages that depends on matrixStats to make sure the updated matrixStats version does not break any of the packages relying on it. This of course assumes that the package tests of those packages actually test all of its code that calls matrixStats.

I have now scanned the code of those 68 packages using static code inspection (partly by looking manually) and I only discovered one package that have this type of problem, namely (our) QDNAseq package, where the internal .binReadCountsPerSample() function calls:

counts <- binCounts(hits[[chromosome]], chromosomeBreaks)

which if matrixStats (>= 0.50.0) is installed, equals:

counts <- binCounts(hits[[chromosome]], idxs=chromosomeBreaks)

I assume you are referring to this one.

Of course, we cannot check R scripts that are not part of CRAN or Bioconductor packages.

daoud-sie commented 8 years ago

Thanks for the background info. I will use named arguments.