Bioconductor / Contributions

Contribute Packages to Bioconductor
131 stars 33 forks source link

GeDi #3359

Closed AnnekathrinSilvia closed 2 months ago

AnnekathrinSilvia commented 3 months ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 3 months ago

Hi @AnnekathrinSilvia

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: GeDi
Title: Defining and visualizing the distances between different genesets
Version: 0.99.0
Date: 2024-03-21
Authors@R: 
    c(
        person(
  given = "Annekathrin", family = "Nedwed", role = c("aut", "cre"),
  email = "anneludt@uni-mainz.de", comment = c(ORCID = "0000-0002-2475-4945")),
        person(
  given = "Federico", family = "Marini", role = c("aut"), 
  email = "marinif@uni-mainz.de", comment = c(ORCID = "0000-0003-3252-7758")
        )
    )
Description: The package provides different distances measurements 
    to calculate the difference between genesets. Based on these scores
    the genesets are clustered and visualized as graph. This is all
    presented in an interactive Shiny application for easy usage.
Depends:
    R (>= 4.4.0)
Imports: 
    GOSemSim,
    Matrix,
    shiny,
    shinyWidgets,
    bs4Dash,
    rintrojs,
    utils,
    DT,
    dplyr,
    shinyBS,
    STRINGdb,
    igraph,
    visNetwork,
    shinycssloaders,
    fontawesome,
    grDevices,
    parallel,
    stats,
    ggplot2,
    plotly,
    GeneTonic,
    RColorBrewer,
    scales,
    readxl,
    ggdendro,
    ComplexHeatmap,
    BiocNeighbors,
    tm,
    wordcloud2,
    tools,
    BiocParallel
Suggests: 
    knitr,
    rmarkdown,
    testthat (>= 3.0.0),
    DESeq2,
    htmltools,
    pcaExplorer,
    AnnotationDbi,
    macrophage,
    topGO,
    biomaRt,
    ReactomePA,
    clusterProfiler,
    BiocStyle,
    org.Hs.eg.db
License: MIT + file LICENSE
Encoding: UTF-8
VignetteBuilder: knitr
URL: https://github.com/AnnekathrinSilvia/GeDi
BugReports: https://github.com/AnnekathrinSilvia/GeDi/issues
RoxygenNote: 7.3.1
Roxygen: list(markdown = TRUE)
Config/testthat/edition: 3
biocViews: 
    GUI, GeneSetEnrichment, Software, Transcription, RNASeq, 
    Visualization, Clustering, Pathways, ReportWriting, GO, KEGG, 
    Reactome, ShinyApps
AnnekathrinSilvia commented 3 months ago

cc' @federicomarini

bioc-issue-bot commented 3 months ago

Your package has been added to git.bioconductor.org to continue the pre-review process. A build report will be posted shortly. Please fix any ERROR and WARNING in the build report before a reviewer is assigned or provide a justification on why you feel the ERROR or WARNING should be granted an exception.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. All changes should be pushed to git.bioconductor.org moving forward. It is required to push a version bump to git.bioconductor.org to trigger a new build report.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 3 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): GeDi_0.99.0.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/GeDi to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

lshep commented 3 months ago

Please fix package Timeout/ERROR/Warnings before it moves forward in the package review process

federicomarini commented 3 months ago

Hi @lshep , we have been already working on it but currently Annekathrin is OOO (and she is listed as maintainer).

Time was cut down in significant amounts on unix machines (MacOS and Linux, see a recent output from GithubActions for a reference). Windows is still taking some time, and we were thinking of switching to mock examples for this - even if of course a "real life example" is simply better to demonstrate the package...

We will follow up on this closely, Federico

federicomarini commented 3 months ago

Ok, I should have spotted that earlier on (for Windows). We got very burned by an unfortunate mis-use of the SnowParam configuration for parallelizing operations.

Long story short: Instead of cutting down time, it just was making all much much slower. Working on a branch right now to see that everything runs as expected!

federicomarini commented 3 months ago

@lshep I managed to stay in the time frames the build system is allowing. I did push all the relevant commits, merged into devel - but I cannot trigger the bump via git push upstream devel (only Annekathrin is allowed for this, I guess?).

Could you maybe trigger this manually - version is bumped already to 0.99.1 πŸ˜‰

Thanks a lot - and apologies for not foreseeing the extra inconveniences...

lshep commented 3 months ago

I assume you got a denies for public key or access rights ERROR? Yes as the listed maintainer only Annekathrin is allowed to push changes -- with his permission I can also give you push privileges to git.bioconductor.org to trigger the new build

federicomarini commented 3 months ago

Yep!

image

As for Annekathrin: she is currently on holiday (& with limited connection), so I took over on this one πŸ˜‰

lshep commented 3 months ago

.... Since you are listed as the other author, I will grant permissions. It will need to be pushed to git.boiconductor.org as a manually build will not work if you did not push to git.bioconductor.org as it won't grab updated code. You may need to activate your BiocCredential account and/or add keys but you should have access

federicomarini commented 3 months ago
image

Done now! I think my credentials were already/still valid. Thanks for un-freezing this!

lshep commented 3 months ago

No worries. but there was no version bump to 0.99.1 so there is no build triggered

federicomarini commented 3 months ago

~~Oh wait that is odd - https://github.com/AnnekathrinSilvia/GeDi/blob/devel/DESCRIPTION does say 0.99.1 right now, and my local copy is also in sync... Shall I do an extra one, just to be safe?~~

Sorry, it was trying to push from the wrong branch...

bioc-issue-bot commented 3 months ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 4b91380dbda18a9fc30ed3f94038eb3c52deb7ed

bioc-issue-bot commented 3 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): GeDi_0.99.1.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/GeDi to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

federicomarini commented 3 months ago

Bummer, the timeout was not really expected now - I noticed a large discrepancy between our builds (locally, and also on GitHub Actions) for the longer functions - see the comparison of

image

vs

image

(don't know why it took about 6-7 minutes longer...)

As for the other BiocCheck-error: Since this does also include an app as a prominent way of handling the workflow, therefore the larger amount of screenshots included.

bioc-issue-bot commented 3 months ago

Received a valid push on git.bioconductor.org; starting a build for commit id: fe08eaa68cd6cd0fd99a66ed916d76dec151db57

bioc-issue-bot commented 3 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): GeDi_0.99.2.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/GeDi to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

federicomarini commented 3 months ago

So, it still does take a tad longer than our "expected" 8 minutes (that we get locally) but at least no timeout anymore.

@lshep The only Error and Warnings are about the size of the package - we have a tarball of ~9 Mb (and accordingly a bulky vignette, GeDi_manual.html). Is this something that can be evaluated while the reviewing process is ongoing? We can do some work on resizing the images/reducing the resolution in parallel, but I cannot promise we will be able to slim that down so much.

bioc-issue-bot commented 3 months ago

Received a valid push on git.bioconductor.org; starting a build for commit id: b6f050b22637b50099f9fb58ce2ffc69745a9561

federicomarini commented 3 months ago

We managed to shrink it quite massively AND keep a good quality of the images, thanks to pngquant (which I would likely suggest somewhere in our http://contributions.bioconductor.org/ - could be relevant to shrink many other screenshot-heavy packages).

We're still above the 5Mb limit, and a bit puzzled of the total size of the images (~1Mb) vs the size of the rendered vignette (~5.something Mb!).

bioc-issue-bot commented 3 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): GeDi_0.99.3.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/GeDi to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 months ago

A reviewer has been assigned to your package for an indepth review. Please respond accordingly to any further comments from the reviewer.

LiNk-NY commented 2 months ago

Hi @AnnekathrinSilvia Thank you for your submission. Please see the review below.

Best regards, Marcel


GeDi

DESCRIPTION

NAMESPACE

vignettes/

R

tests

> covr::package_coverage(type = "all")
GeDi Coverage: 96.82%
R/utils.R: 79.41%
R/download_PPI.R: 82.61%
R/go_Similarity.R: 94.12%
R/meetmin_Scoring.R: 95.00%
R/kappa_Scoring.R: 95.24%
R/jaccard_Scoring.R: 95.45%
R/sorensen_dice_Scoring.R: 95.45%
R/pMM_Scoring.R: 98.00%
R/clustering.R: 99.15%
R/gs_Graph.R: 99.32%
R/app.R: 100.00%
R/clustering_wordcloud.R: 100.00%
R/distance_dendro.R: 100.00%
R/distance_heatmap.R: 100.00%
R/gs_histogram.R: 100.00%
bioc-issue-bot commented 2 months ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 9352fd98e0924a1b4d22114725f3a2f1d2a8cd23

AnnekathrinSilvia commented 2 months ago

Hi @AnnekathrinSilvia Thank you for your submission. Please see the review below.

Best regards, Marcel

GeDi

Thank you for your timely review @LiNk-NY ! We addressed the points you raised in a separate branch, bioc_review, which we just merged into the devel branch (link to the PR for you to check all things at once: https://github.com/AnnekathrinSilvia/GeDi/pull/21).

You can find below our comments, in a point by point fashion for clarity

* Overall, the package provides _a lot_ of functionality for the user. One
  issue may be the maintenance of the package that depends on a lot of other
  packages; therefore, causing the package to be fragile to issues in any of the
  dependencies. 

Thanks for raising this comment, which kind of burdened us throughout the package development as well. We are aware that indeed GeDi tries to do a lot of things, both on the command line side as well as providing an app to use all the functionality in a seamless manner. Our way to try and keep the risk of having dependencies break our packages β€œlower” was to set up a comprehensive suite of unit testing - this was something that we anyway aimed to achieve to have a robustly working package. We tried our best to be parsimonious on choosing which dependencies to include - hopefully this is a good balance between using working routines from others vs having to reinvent the wheel ourselves.

  There is some reliance on `for` loops which may lead to
  performance issues.  Ensure that the `for` loops for the most part are using
  the allocate and fill method, where possible, rather than growing a vector.

We did check the cases where we used a for loop instead of a vectorized approach. In most cases, we do effectively use pre-allocated vectors or matrices to avoid unnecessary and unhealthy growing of these objects. In some cases, the usage of a for loop was a natural way of implementing some functions, that probably could make it even easier to read for external users. Where the for loops are still there, we did verify that the implementation via for itself is not indeed a computational bottleneck, but this is an aspect that we will actively keep monitored and we can still try to improve if needed.

DESCRIPTION

* Looks good.

NAMESPACE

* Looks good.

vignettes/

* Consider letting users specify the column names that correspond to
  "Genesets" and "Genes" as part of the main function, rather than having the
  user rename the columns in their data.

Thank you very much for pointing this out. We now changed the code and added two additional parameters to the main function. Users can now provide the names of the columns in which the geneset ids and genes are stored. The need for users to rename their data is now no longer needed.

* Note that in the `Additional Information` section, the support site should
  be used for user experience questions and GitHub for bug reports.

Thank you for pointing this out. We changed the part accordingly.

* Minor. Consider wrapping text to 80 column width for readability.

Thanks for raising the point, we did indeed spot out some lines that were very long and improved their readability by splitting them over multiple lines in a way that does not affect the functionality of the package.

R

* Use double `||` within `if` statements to ensure that a single logical is
  evaluated.

Thanks a lot for pointing this out. We went through all occurrences in the app and changed it to ||.

* Remove or update code that is commented out.

We removed all code that was commented out and no longer used.

* The work is extensive and perhaps using shiny modules would have helped in
  avoiding a monolithic `app.R` file. Consider this for future development.

Thank you for the very valuable advice. The app.R file is indeed very large and extensive as it grew with time and development of the package. I will keep modules in mind for my next package and plan the development accordingly beforehand.

* Please refer to https://contributions.bioconductor.org/data.html?q=data#exported-data-and-the-data-directory
  for examples on how to load exported data inside a function (for
  `macrophage_topGO_example`).

Thank you for pointing out the link and resource. We changed the loading of the data accordingly.

* There are potential performance improvements possible for `checkInclusion`
  e.g., by using all the combinations of comparisons (with `combn`) across all
  elements and identifying subsets that way.

Thank you for your suggestion. We implemented a version of the function that uses combn().

* Consider using caching for files e.g., in `getId` caching the `species.txt`
  file

Thank you for this advice. We now added the caching of the species.txt file.


Please let us know if we need to do some additional work. Thank you again for the feedback!

AnnekathrinSilvia commented 2 months ago

(Please note that the package is still a little larger than 5Mb, so this could very much lead to the build running into an error. Our repository has Github actions set up to ensure that the package builds correctly.)

bioc-issue-bot commented 2 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): GeDi_0.99.4.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/GeDi to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

LiNk-NY commented 2 months ago

Hi @AnnekathrinSilvia

It looks like the rendered HTML is taking up a lot of space in the tarball. I would recommend to reduce the size of the PNG files and / or remove font_awesome icons from the vignette.

Best, Marcel

federicomarini commented 2 months ago

So far I can comment that we already gave a full round on all png files with pngquant - and that worked wonderfully, to the point that I proposed a small PR to the pkgrevdocs repo, see https://github.com/Bioconductor/pkgrevdocs/pull/119.

I was not thinking of the font_awesome icons being so bulky. If we remove them, we would lose the nice in-text symbols that point the users to the widgets and icons used in the app.

image

Do you think it is "safe enough" to use text descriptions for those?

LiNk-NY commented 2 months ago

Hi Federico! @federicomarini Thanks for that recommendation. We will review it.

Based on something like https://www.debugbear.com/html-size-analyzer, it looks like the style of the text (font awesome icons) takes up about 4.1 MB. It may be better to just describe the icons or add a screen shot of the app with different color highlights corresponding to each item in the list.

One other way to reduce the size (though less favorable) is to link to URL images in the vignette. The images would be hosted in the GitHub repo but not included in the tarball (via .Rbuildignore).

Best regards, Marcel

federicomarini commented 2 months ago

Got it, and thanks for the debugbear tip, I was not aware of that. Crazy that the icons take up so much space, we will find an alternative way to point towards the required information. Probably pointing towards them with a precise-enough description is a very good compromise.

I will leave the final decision to the judgment of @AnnekathrinSilvia - excluding the hosting of images via URL as it makes sense to have "everything in one place".

Thanks again for this, I will run other html vignettes through that to see if more space can be saved. Federico

bioc-issue-bot commented 2 months ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 08ea060b5ff478cbc5d3cdeb242425700ea9e007

bioc-issue-bot commented 2 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): GeDi_0.99.5.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/GeDi to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

AnnekathrinSilvia commented 2 months ago

Hi @LiNk-NY,

Thank you for reviewing our changes. We now decided to remove the icons completely and just describe them in the text of the vignette. This should solve the size issue of the package.

Best Annekathrin

AnnekathrinSilvia commented 2 months ago

Hi @LiNk-NY, are there any other items you expect/envision us to work on, apart from the ones we already addressed? We are happy to dedicate some more time before the deadline might be too close. Thanks! Annekathrin

LiNk-NY commented 2 months ago

Hi @AnnekathrinSilvia Thank you for making those changes. Other than reducing the check time over the long run, the package looks good to go. Thank you for your submission. -Marcel

bioc-issue-bot commented 2 months ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.

federicomarini commented 2 months ago

Thanks Marcel for the green light, that is an excellent way to start the weekend πŸŽ‰ We will keep more than an eye in the upcoming developments on the check time to make sure it does not become a critical aspect!

lshep commented 2 months ago

The default branch of your GitHub repository has been added to Bioconductor's git repository as branch devel.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/AnnekathrinSilvia.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("GeDi"). The package 'landing page' will be created at

https://bioconductor.org/packages/GeDi

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.