Bioconductor / MatrixGenerics

S4 Generic Summary Statistic Functions that Operate on Matrix-Like Objects
https://bioconductor.org/packages/MatrixGenerics
12 stars 1 forks source link

What is the contract for the return value of these generics? #1

Open PeteHaitch opened 6 years ago

PeteHaitch commented 6 years ago

Broadly, there are two types of operations performed by these matrix summarisations:

  1. matrix-to-vector
  2. matrix-to-matrix

To take two examples from matrixStats:

  1. matrix-to-vector: val <- colMedians(x) results in length(val) == ncol(x)
  2. matrix-to-matrix: val <- colCummaxs(x) results in dim(val) == dim(x)

Particularly for matrix-to-matrix ops where the input is a disk-backed object, it may be desirable to be able to specify that the result should be realised on disk (or via some other backend) instead of as an ordinary matrix.

I see two options for the 'contract' of the generic and methods:

  1. Methods must always return an ordinary vector (matrix-to-vector) or matrix (matrix-to-matrix)
  2. Methods may return any suitable vector-like (matrix-to-vector) or matrix-like (matrix-to-matrix) object. E.g., the colCummaxs,DelayedMatrix-method might gain a BACKEND argument to specify what sort of object should be returned (with some sort of sensible default based on the class of the input).

I favour (2), although its flexibility makes for a somewhat 'loose' contract between the input and output.

LTLA commented 6 years ago

Another option might be to obtain the BACKEND implicitly from options(), after some setBackEnd() command; in case it is too unwieldy/non-general to ask that all methods include a BACKEND= in their definition. I think we have this already in DelayedArray::setRealizationBackend?

PeteHaitch commented 6 years ago

I think we have that or something close to it for DelayedArray, but I'm also thinking about how to support other on-disk array-like structures, such as matter, which don't use the whole DelayedArray framework.