Bioconductor / MatrixGenerics

S4 Generic Summary Statistic Functions that Operate on Matrix-Like Objects
https://bioconductor.org/packages/MatrixGenerics
12 stars 1 forks source link

CRAN instead of Bioconductor? #21

Open HenrikBengtsson opened 3 years ago

HenrikBengtsson commented 3 years ago

Have you guys considered releasing this package and matrixStats derivatives (DelayedMatrixStats(*) and sparseMatrixStats) on CRAN rather than Bioconductor? AFAIU, there's nothing specific to Bioconductor in them.

Having them on CRAN would reach a much bigger audience, give them much exposure, increase their usage by users and packages, and improve quality in the long run. This would benefit users and developers outside of the Bioconductor sphere. I find it's unfortunate to "hide them away" on Bioconductor. (I think there are other packages on Bioconductor that the same argument apply to)

cc/ @PeteHaitch @const-ae

HenrikBengtsson commented 3 years ago

Ah... I hit submit before adding:

(*) Yes, DelayedMatrixStats depends on other Bioconductor packages to support specific data types but maybe those could be Suggest:ed dependencies?

LTLA commented 3 years ago

DMS has some pretty hard dependencies on the BioC stack; DA isn't really optional, as the whole purpose of DMS's existence is to wrap matrixStats methods for DelayedArray objects. Installing DMS without installing DA wouldn't make much sense.

HenrikBengtsson commented 3 years ago

Ok. So, if not possible for DelayedMatrixStats, the question/suggestion remains for MatrixGenerics and sparseMatrixStats.

LTLA commented 3 years ago

I'll throw in my 2c.

I had also considered this question for a few of my packages that don't really have a hard BioC dependency, e.g., basilisk, beachmat (<2.1). These are infrastructure packages that could have been generally useful on CRAN. In the end, I decided against it in favor of the BioC build system for a variety of reasons:

Of course, I appreciate the cost to users not knowing or wanting to use Bioconductor, but putting it on BioC didn't inhibit me from solving the problem that I built the package for, so I didn't worry too much about it.

hpages commented 3 years ago

Yep, for all these reasons.

Note that you could also make that case for S4Vectors, IRanges, and a few other core infrastructure packages. There's nothing specific to Bioconductor about Rle, DataFrame, Hits, IRanges, Views, or NumericList objects. It's just that maintaining such a complex class hierarchy on CRAN would not be practical. It's a lot easier with fast cycles between pushes and daily build/check reports. Also a lot better for everybody's health.

'xcuze me but I have to go take care of the big bird now... Happy Thanksgiving!

HenrikBengtsson commented 3 years ago

Happy Thanksgiving to you too.

PeteHaitch commented 3 years ago

My reasons are similar to @ltla and @hpages, although not as strongly held/felt since I don't maintain anywhere near as many packages as they (or you, @HenrikBengtsson) do!

I do still think there might scope for MatrixGenerics (and perhaps even sparseMatrixStats) to be on CRAN since there's nothing specific to Bioconductor about these and they are quite tightly coupled to matrixStats (even if the coupling is in spirit rather than formal dependencies). In fact, I raised this topic a couple of years ago when I first started on MatrixGenerics (https://github.com/Bioconductor/MatrixGenerics/issues/2) where we had a brief discussion. But we ultimately resolved to go via BioC for the aforementioned reasons of familiarity/intimacy with BioC development process and expertise.

Personally, I think I'd be comfortable with a migration to CRAN for MatrixGenerics when the package/API is stable, but there's still a bit of work to do on that front (particularly with regard to DelayedMatrixStats) which may force reconsideration of some decisions in MatrixGenerics(?).

const-ae commented 3 years ago

I am just catching up again with my Github issues, sorry.

One of the contributing factors for me was that MatrixGenerics repo already existed under the Bioconductor brand on GitHub. And it felt natural to put MatrixGenerics and sparseMatrixStats on Bioconductor together.

I think I'd be comfortable with a migration to CRAN for MatrixGenerics when the package/API is stable, but there's still a bit of work to do on that front

I would also be fine with moving MatrixGenerics and sparseMatrixStats to CRAN. I maintain packages at both repositories and for me both maintenance models work.

However, I actually do wonder what a bit about the potential use-cases for sparseMatrixStats outside single cell transcriptomics (but that could also just be my limited creativity). I think most people who store their data in sparse matrices want to ignore the non-explicit elements, whereas I treat them as zeros.