Bioconductor / Contributions

Contribute Packages to Bioconductor
135 stars 33 forks source link

sparrow #2342

Closed lianos closed 3 years ago

lianos commented 3 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 3 years ago

Hi @lianos

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: sparrow
Type: Package
Title: Take command of set enrichment analyses through a unified interface
Version: 0.99.00
Authors@R: c(
    person(
        "Steve", "Lianoglou", role = c("aut", "cre"),
        email = "slianoglou@gmail.com",
        comment = c(ORCID = "0000-0002-0924-1754")),
    person("Denali Therapeutics", role = c("fnd"), comment = "2018+"),
    person("Genentech", role = c("fnd"), comment = "2014 - 2017"))
Description: Provides a unified interface to a variety of GSEA techniques from
    different bioconductor packages. Results are harmonized into a single object
    and can be interrogated uniformly for quick exploration and interpretation
    of results. Interactive exploration of GSEA results is enabled through
    a shiny app provided by a sparrow.shiny sibling package.
Depends:
    R (>= 4.0)
Imports:
    babelgene (>= 21.4),
    BiocGenerics,
    BiocParallel,
    checkmate,
    circlize,
    ComplexHeatmap (>= 2.0),
    data.table (>= 1.10.4),
    DelayedMatrixStats,
    edgeR (>= 3.18.1),
    ggplot2 (>= 2.2.0),
    graphics,
    grDevices,
    GSEABase,
    GSVA,
    irlba,
    limma,
    Matrix,
    methods,
    plotly (>= 4.9.0),
    stats,
    utils,
    viridis
Suggests:
    AnnotationDbi,
    BiasedUrn,
    Biobase (>= 2.24.0),
    BiocStyle,
    DESeq2,
    dplyr,
    dtplyr,
    fgsea,
    GO.db,
    goseq,
    hexbin,
    magrittr,
    matrixStats,
    msigdbr (>= 7.4.1),
    KernSmooth,
    knitr,
    PANTHER.db (>= 1.0.3),
    R.utils,
    reactome.db,
    reshape2,
    rmarkdown,
    SummarizedExperiment,
    statmod,
    stringr,
    testthat,
    webshot
biocViews: GeneSetEnrichment, Pathways
BiocType: Software
VignetteBuilder: knitr
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Collate:
    'AllClasses.R'
    'AllGenerics.R'
    'GeneSetDb-class.R'
    'GeneSetDb-methods.R'
    'SparrowResult-methods.R'
    'aaa.R'
    'bioc-accessors.R'
    'calculateIndividualLogFC.R'
    'convertIdentifiers.R'
    'validateInputs.R'
    'do.camera.R'
    'do.cameraPR.R'
    'do.fgsea.R'
    'do.fry.R'
    'do.geneSetTest.R'
    'do.goseq.R'
    'do.logFC.R'
    'do.ora.R'
    'do.roast.R'
    'do.romer.R'
    'do.svdGeneSetTest.R'
    'geneSetSummaryByGenes.R'
    'get-msigdb.R'
    'get-panther.R'
    'get-reactome.R'
    'getKeggGeneSetDb.R'
    'gsea-helpers.R'
    'package.R'
    'plots-corplot.R'
    'plots-interactive.R'
    'plots-mgheatmap.R'
    'plots-mgheatmap2.R'
    'renameCollections.R'
    'renameRows.R'
    'scale_rows.R'
    'scoreSingleSamples.R'
    'seas.R'
    'single-sample-scoring-methods.R'
    'species.R'
    'testing-helpers.R'
    'utilities.R'
    'volcano_plot.R'
    'zzz.R'
RoxygenNote: 7.1.1
Roxygen: list(markdown = TRUE)
bioc-issue-bot commented 3 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/sparrow to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 71fc95a9831d6ee22c5b323004f0962c512d8dd7

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/sparrow to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: b80ccd2508299f327055b4d0d683578d280bddb4

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/sparrow to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 05a0b8e865f936287b44b3430bb77d14a6094d52

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/sparrow to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

hpages commented 3 years ago

Hi Steve, it's been a long time!

Thanks for this new submission. You're apparently aware of the existence of the BiocSet package by @Kayla-Morrell, @mtmorgan, @kevinrue, and @llrs. From your vignette (which is very nice):

NOTE: The GeneSetDb class was initially developed to address some peculiarities of storing, querying, and manipulating gene sets using the old-school GSEABase::GeneSetCollection class. It is not meant to replace the more recently developd BiocSet class, and we plan to support and migrate to that gene set container in the future. This section provides a quick overview of some of the functionality of gene set container.

Any reason why you're not just using that? I'm worried that getting a GeneSetDb-centric sparrow into Bioconductor will make a later migration to BiocSet much harder than if the package was migrated now, before people start using it and depending on it.

Small problem with section numbers in the vignette:

Screenshot from 2021-10-05 18-14-30

Thanks, H.

llrs commented 3 years ago

Many thanks for mentioning me but I'm not an author of the package just a contributor on the discussions.

The Bioconductor common Methods and Classes page mentions GSEABase classes for gene set, if this is discouraged maybe the website might need to be updated.

Last there is also my package BaseSet on CRAN with similar functionalities. It already allows to load data from GSEABase GeneSetCollection to its own class.

lianos commented 3 years ago

Hi Herve!

Thanks for the quick feedback, and yes: has been a very long time!

Any reason why you're not just using that [a BiocSet]? I'm worried that getting a GeneSetDb-centric sparrow into Bioconductor will make a later migration to BiocSet much harder than if the package was migrated now, before people start using it and depending on it.

Yes, I was worried that you would be worried about that :-)

The reason I'm not using that at the moment is that it would take quite a while to refactor and test. There has been a lot of functionality baked into the GeneSetDb class over the years, and I just haven't had the time to check and port over to BiocSet. I do plan on taking a shot at porting it over in the future.

Why am I submitting now? It has long been on my todo list, but also there are updates in another package a colleague of mine has in Biocondcutor (gCrisprTools) that has incoming updates for next release that rely on functionality in sparrow, and was hoping to support those developments there by getting sparrow published now.

I can understand why this gives you pause, though. The good news, perhaps, is that while the GeneSetDb class plays an important role within the package, its use is quite ... uhm ... not so important "outside" of the package. I suspect users would interact very little (if at all) with the GeneSetDb, as the majority of the use cases I've found working with sparrow simply return data.frames of gene sets after analyses.

For instance, if you run the main analysis function seas() by giving it a GeneSetCollection's worth of gene sets, you can very likely never even have to know what a GeneSetDb is.

Would you be OK if, for now, I simply add the ability to provide a gene set collection as a BiocSet as inputs to the main user-facing functions (seas() being the primary one, but also probably scoreSingleSamples()) and iterate on migrating the internals of the package to BiocSet in time?

It's not often that users will extract the internal GeneSetDb class, but I can also provide the ability to have it returned as a BiocSet as well.

Would that be sufficient?

Also, thank you for pointing out the header-level issue in the vignette, I'll fix each section to use an h1 header element and readjust from there.

kevinrue commented 3 years ago

Thanks for mentioning me as well @hpages

I also haven't directly contributed to BiocSet, but I have very much appreciated the discussions that ensued from the parallel efforts on BiocSet, BaseSet, and my own unisets. (I haven't pushed unisets to any other repository than GitHub, to avoid further confusion, and leave full visibility to BiocSet moving forward).

To be honest, I would point out that even since BiocSet was released, I haven't witnessed wide community adoption of the package (yet). From a brief look at https://bioconductor.org/packages/release/bioc/html/BiocSet.html I see today 1 package that Depends on BiocSet and 1 package that Suggests it.

Obviously, Rome wasn't built in a day, and the rich-gets-richer way of software adoption means that BiocSet probably won't be appealing to many developers until more developers buy into the new infrastructure and create a critical mass of users, developers and documentation driving others to it.

While sparrow could be one of those early adopters, I wouldn't blame anyone for waiting to see how others get along with BiocSet before investing their own time into migrating their own package. Some deity somewhere probably knows what the future holds for BiocSet and whether some other infrastructure might come along and motivate yet another migration.

In many ways gene set analysis and infrastructure unfortunately very much still feels like a wild west of data structures, much in need of homoegeneisation between packages (hence the efforts invested in BiocSet, BaseSet and unisets). However, old favourites (e.g. clusterProfiler) are still implicitly supporting certain older data structures, and the amount of work involved in migrating those packages from one infrastructure to another are considerable factors, in my view, holding back the adoption of new data structures for gene sets.

I'm not the package reviewer here, but that part of @lianos 's latest message

I suspect users would interact very little (if at all) with the GeneSetDb, as the majority of the use cases I've found working with sparrow simply return data.frames of gene sets after analyses.

makes me feel that this package might not hurt any user by using the GeneSetDb internally, since it accepts a major Bioconductor data structure as input (GeneSetCollection) and produce data.frame as ouput.

Having said that, of course, using standard Bioconductor data structures internally is always encouraged for readability and contributions/maintenance by other Bioconductor developers. Packages that somehow get accepted into Bioconductor while not using standard containers are more challenging (and lonely) to maintain and more likely to fall into deprecation over time.

lianos commented 3 years ago

Thanks for your thoughtful commentary, @kevinrue.

Looking a bit more closely at BiocSet, there are areas of clear overlap with the GeneSetDb class I implemented, as well as functionality that is specific to my GeneSetDb class that sparrow relies on internally that I don't yet see has a clear analog for in BiocSet (yet :-).

Hopefully the extra functionality I rely on is useful to the design/implementation of BiocSet, and working on an internal conversion of GeneSetDb to BiocSet can help make that more clear. If the functionality is also useful to other end-users and GSEA package developers, as well, I am happy to help discuss/contribute changes to BiocSet to make that happen.

My hope is that sparrow can be approved with its internal GeneSetDb now, and we can gradually work towards the GeneSetDb -> BiocSet transition in time. 🤞

As it is, I've already added GeneSetDb <-> BiocSet conversion functions that allow users to send in BiocSet objects into sparrow functions and convert a GeneSetDb object back to a BiocSet in the event that the user wants to pull this object out of the classes in sparrow.

Once those changes work their way through my github actions to ensure all package level tests pass, I can also push back to the bioconductor git repo (or should I wait for a more formal review of the codebase as it currently is?)

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: fb5aba738efa5e1b7ef6f72257fbdd8ecb8a3f43

lianos commented 3 years ago

Just to keep everyone up to speed, I've run through the package and made more changes to promote the use of a BiocSet where GeneSetDb has been used for all the main user facing functions of this package.

This package also provides convenience functions that return GeneSetDb objects from different sources, like:

I've added sibling functions to all these methods to return a BiocSet of the same, so they can easily be used in the analyses provided by sparrow (ie. getMSigCollection(), getKeggCollection(), etc)

I also have changed documentation and the vignette to suggest a BiocSet first use of the package to end users, although the vignette itself is still using a GeneSetDb.

I can/will continue to make these modifications over time, but just want to make it abundantly clear that I'm putting in a good-faith effort to assuage any fears of not being willing to do a full BiocSet migration when it can be done down the road.

I can not completely replace the internals of sparrow to use BiocSet because there is (well tested) functionality and speed found in the GeneSetDb class and how the functions in sparrow interact with it.

Again, I just want to re-iterate that as the package now stands, users can use any/all functionality it provides by providing a BiocSet's worth of gene sets to the sparrow functions, and the use of a GeneSetDb is an implementation detail that is internal to the package. This makes it possible to carefully migrate the internals without disrupting the use of sparrow by the community since the interface won't change (and that interface now supports a BiocSet).

Thanks for your consideration.

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/sparrow to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

kevinrue commented 3 years ago

Again, I'm not the reviewer here, but as a fellow package developer, I cannot stress how much I appreciate hearing updates from someone taking BiocSet for a test run. Your willingness to do this is providing evidence that might convince/comfort/motivate others (including me) that BiocSet is a viable infrastructure ready to be re-used across the Bioconductor project, not just a concept package that just looks pretty.

From the sounds of it, you are also providing the feedback that BiocSet developers/maintainers have been needing to further consider functionality that it might miss to cater for the needs of developers holding back still.

Thanks for your work and hard effort!

lianos commented 3 years ago

Thanks again for the thoughtful comments, @kevinrue !

@hpages according to the build report, the linux buildbot "nebbiolo2" seems to be taking ~5x longer to run the code in the package than the other two, and has timed out on the R CMD check run, which has now slapped a warning label on the submission.

I'll wait to make some meatier changes primarily to the documentation to push another commit to trigger the re-build. If this "warning" label is holding up the package review, though, I can try to tickle the build bot again to see if I get some better luck on the next build.

hpages commented 3 years ago

Thanks @llrs and @kevinrue for chiming in.

@kevinrue Agree with you on the importance of exposing BiocSet thru more real world use cases and gathering feedback that will ultimately allow the container to improve and convince. Steve's work will help tremendously achieve that goal.

@lianos All is fine. I didn't realize that GeneSetDb was mostly used behind the scene and that migrating the internals of the package to use BiocSet instead of GeneSetDb wouldn't turn out to be so disruptive after all. Thanks for adding GeneSetDb <-> BiocSet conversion functions. They will definitely help interoperability and pave the way to a future migration to BiocSet. Your other changes/additions sound good too. The vignette is very nice.

Best, H.

bioc-issue-bot commented 3 years ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.

lianos commented 3 years ago

@hpages that's great news, thanks for your feedback and taking the time to review the package!

Thank you also to @llrs and @kevinrue for sharing your thoughts, as well.

lshep commented 3 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/lianos.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("sparrow"). The package 'landing page' will be created at

https://bioconductor.org/packages/sparrow

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.