Bioconductor / BiocClassesWorkingGroup

Notes and Discussion concerning recommended classes for Bioconductor
1 stars 0 forks source link

Participation in working group #2

Open MalteThodberg opened 1 year ago

MalteThodberg commented 1 year ago

I would like to join the discussion as well.

I was recently developing a new class for storing GWAS summary statistics. I was frustrated to find info and guides scattered all over the place: https://github.com/MalteThodberg/S4-Bioconductor

IMO core classes should also include low level classes such as Vector, List, DataFrame, etc., in addition to high level classes such as SummarizedExperiment and GRanges. Importantly it should be well documented how to extend and document them and how their inheritance works. See the DelayedArray and SummarizedExperiment packages for nice examples.

Perhaps the deeper problem is the lack of a central repository for guidelines to S4 development in BioC and R in general. This makes it very hard for new developers joining BioC to implement and take advantage of the S4 system.

vjcitn commented 1 year ago

@vjcitn is interested

LiNk-NY commented 1 year ago

I'm also interested in participating. IMO, the product of this working group should ultimately be a bookdown book that describes best practices for the S4 system.

llrs commented 1 year ago

I disagree, the S4 system is not exclusive of Bioconductor and this working group should not try to set best practices for practices outside the Bioconductor project. Although certainly there is much experience with the S4 system in the Bioconductor community, but it is not the only one OOP paradigm used in Bioconductor packages and the working group needs to keep in mind the other classes S3, RC, R7...

LiNk-NY commented 1 year ago

S4 is a major feature of Bioconductor and the foundation of Bioconductor infrastructure. S4 is not as well documented (in books) as other systems such as S3. The best practices would certainly be within the scope of Bioconductor.

I suspect that there are fewer infrastructure packages in Bioconductor that use R6 or S3, let alone R7 which is still a concept. The material produced can still consider other systems but S4 should be the main focus.

llrs commented 1 year ago

S4 is a feature of R not Bioconductor but I agree it is not well documented and best practices for Bioconductor are well within the scope of the working group.

But from that to write a book about S4 system is in my opinion too much to ask from such a working group. The README explains that the purpose of the working group is to:

The Bio Classes and Methods Working Group tackles the following questions and issues:

What are the 'official' classes?

What is the procedure to establish a new 'official' class?

To what extent should these be enforced during package review?

In general, a package will not be accepted if it does not show interoperability with the current Bioconductor ecosystem.

If not strictly enforced, should we at least require a wrapper function to convert to these?

LiNk-NY commented 1 year ago

S4 is a feature of R not Bioconductor but I agree it is not well documented and best practices for Bioconductor are well within the scope of the working group.

Suffice it to say that much of the Bioconductor infrastructure (when it comes to classes and methods) was built on S4.

But from that to write a book about S4 system is in my opinion too much to ask from such a working group.

To be clear, it would be good to communicate a set of guidelines regarding S4 implementation in Bioconductor in book form, much like https://contributions.bioconductor.org. I did not mean that a textbook on the S4 system in R should be written which is perhaps what you were alluding to.

MalteThodberg commented 1 year ago

I agree with both of you:

On one hand, I think there are already good book type resources for general S4 in R (e.g. https://adv-r.hadley.nz/s4.html and https://link.springer.com/chapter/10.1007/978-1-4842-2919-4_6).

On the other hand, Bioconductor is by far the largest user of S4, and the only one to my knowledge that very heavily depends on inheritance between classes and packages. There are many peculiarities to this system that should be documented in something like a book or long format vignette:

Overall design:

Benefits of in expanding specific classes:

llrs commented 1 year ago

It is nice overall desgin, but I think there needs to be more advice for package developers I have some suggestion also on #1. Also it is not clear what is a core class and how one can become one or when it is a popular class.

And perhaps a long term purpose of the group could help create new classes that several packages could benefit from. I.e. if multiple packages have an extension of SummarizedExperiment for X purpose with similar implementations then we might all benefit from a common extension.

jorainer commented 1 year ago

I guess what would help is to first identify and list common/promoted classes, compile them in a document and maybe also group them by use case and/or data type (two-dimensional data structure, ranges data, RNA-seq, spatial transcriptomics, mass spectrometry...). I think that would be beneficial for new developers to look up first what's out there before implementing their own classes.

lgatto commented 1 year ago

@jorainer - does this not address, at least in part, what you want: https://contributions.bioconductor.org/important-bioconductor-package-development-features.html#commonclass