Open PeteHaitch opened 6 years ago
Something like 3a sounds like the way to go, if the Matrix maintainers are on board. All downstream packages would strictly provide method implementations and no further generics - this would include matrixStats. Performance-wise, I can't imagine the time spent in S4 dispatch is all that much, so I can't see that switching from raw functions to S4 is a big deal.
Better to check with @HenrikBengtsson about performance issues; my guess is that he's unwilling (rationally or otherwise!) to accept the cost. Also @mmaechler will of course have a significant stake in this...
I'm not yet ready to delve into the discussion... but I tend to agree that there's a need for a "central" place for S4 generics. Are you aware of implicitGenerics (in package 'methods')? We had added quite a few there... but of course only for functions existing in base (or "base packages"). For other matrix-specific functions, I'd tend to agree it would make sense to have them in (and subsequently "from") Matrix.
Are you aware of implicitGenerics (in package 'methods')?
I'm aware, but unsure if I'm using them correctly (cf. https://github.com/Bioconductor/MatrixGenerics/issues/3)
My take on this: Overhead of method dispatching, and also ::
, is significant at some point and to some users in some use cases, e.g. millions of calculations of non-large matrices. You don't want to force unnecessary overhead on developers when it comes to time-critical processing. This is one of the objectives/roles of the matrixStats package and there is a need for its highly performance-optimized implementation. If one ignores this, someone will be waiting around the corner to implement matrixStats2 without such overhead.
The solution? Two APIs - one with minimum overhead and one that supports method dispatching. The two APIs can be separated by two different naming conventions(*). The dispatching API can default to calling the minimum-overhead one where possible (here the matrixStats API).
(*) In base R there are already a few cases of this where one functions come with minimal overhead and another with bells and whistles because developers/users need access to both, e.g. .colSums()
vs colSums()
.
matrixStats 'owns' much of the API we (DelayedMatrixStats, matter, etc.) are trying to follow/mimic. The remainder is defined in base (e.g,
colSums()
); there may be other generics we wish to add (e.g.,prcomp()
). However, for performance reasons matrixStats uses ordinary functions rather than S4 (or S3). We want generic versions of these functions so that we can write methods that operate on matrix-like data structure; within Bioconductor, this generally means S4 generics but it may be worth also supporting S3 generics.Based on the discussion in https://stat.ethz.ch/pipermail/bioc-devel/2017-November/012273.html, I think our options are: