Bioconductor / Contributions

Contribute Packages to Bioconductor
131 stars 33 forks source link

gg4way #3129

Closed ben-laufer closed 9 months ago

ben-laufer commented 10 months ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 10 months ago

Hi @ben-laufer

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: gg4way
Title: 4way Plots of Differential Expression
Version: 0.99.0
Authors@R: c(
    person(given = "Ben",
 family = "Laufer",
 role = c("aut", "cre"),
 email = "blaufer@gmail.com",
 comment = c(ORCID = "0000-0002-7558-1335")),
    person(given = "Brad",
 family = "Friedman",
 role = c("aut")))
Description: 4way plots enable a comparison of two contrasts of differential gene expression. The gg4way package creates 4way plots using the ggplot2 framework and supports popular Bioconductor objects. The package also provides information about the correlation between contrasts and significant genes of interest.
License: MIT + file LICENSE
URL: https://github.com/ben-laufer/gg4way
BugReports: https://github.com/ben-laufer/gg4way/issues
biocViews: 
    Software,
    Visualization,
    DifferentialExpression,
    GeneExpression,
    Transcription,
    RNASeq,
    SingleCell,
    Sequencing
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Depends:
    R (>= 4.3.0),
    ggplot2
Imports: 
    DESeq2,
    dplyr,
    ggrepel,
    glue,
    janitor,
    limma,
    magrittr,
    methods,
    purrr,
    rlang,
    scales,
    stats,
    stringr,
    tibble,
    tidyr
Suggests: 
    airway,
    BiocStyle,
    edgeR,
    knitr,
    org.Hs.eg.db,
    rmarkdown,
    testthat
VignetteBuilder: knitr
Config/testthat/edition: 3
LazyData: false
bioc-issue-bot commented 10 months ago

Your package has been added to git.bioconductor.org to continue the pre-review process. A build report will be posted shortly. Please fix any ERROR and WARNING in the build report before a reviewer is assigned or provide a justification on why you feel the ERROR or WARNING should be granted an exception.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. All changes should be pushed to git.bioconductor.org moving forward. It is required to push a version bump to git.bioconductor.org to trigger a new build report.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 10 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Build System.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Bioconductor Build System: Linux (Ubuntu 22.04.2 LTS): gg4way_0.99.0.tar.gz macOS 12.6.5 Monterey: gg4way_0.99.0.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/gg4way to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 10 months ago

A reviewer has been assigned to your package for an indepth review. Please respond accordingly to any further comments from the reviewer.

bioc-issue-bot commented 10 months ago

Your package has been added to git.bioconductor.org to continue the pre-review process. A build report will be posted shortly. Please fix any ERROR and WARNING in the build report before a reviewer is assigned or provide a justification on why you feel the ERROR or WARNING should be granted an exception.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. All changes should be pushed to git.bioconductor.org moving forward. It is required to push a version bump to git.bioconductor.org to trigger a new build report.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 10 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Build System.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Bioconductor Build System: Linux (Ubuntu 22.04.2 LTS): gg4way_0.99.0.tar.gz macOS 12.6.5 Monterey: gg4way_0.99.0.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/gg4way to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 10 months ago

Your package has been added to git.bioconductor.org to continue the pre-review process. A build report will be posted shortly. Please fix any ERROR and WARNING in the build report before a reviewer is assigned or provide a justification on why you feel the ERROR or WARNING should be granted an exception.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. All changes should be pushed to git.bioconductor.org moving forward. It is required to push a version bump to git.bioconductor.org to trigger a new build report.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 10 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Build System.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Bioconductor Build System: Linux (Ubuntu 22.04.2 LTS): gg4way_0.99.0.tar.gz macOS 12.6.5 Monterey: gg4way_0.99.0.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/gg4way to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

PeteHaitch commented 9 months ago

Hi @ben-laufer,

Thank you for submitting gg4way to Bioconductor. It's a nice, simple package with a clear purpose, which made for an straightforward review. Thank you!

The package is largely fine as is, but I did want to ask whether you considered implementing gg4way() as a generic with specific methods for MArrayLM, DESeqDataSet, and (a list of) DESeqResults objects, along with a default version for the general 'tidyList' format? The Glimma package does something similar (custom plots of DE analysis results) and uses the S3 object oriented programming system to implement this. This would slightly simplify the user-interface and obviate the need to export the tidyDGE() function. To be clear, this isn't a requirement for acceptance, but I thought it worth raising in case it is helpful to you.

In my review below I have separated the issues into Required and Recommended points that I would ask you to address before the package can be accepted. Would you please provide line-by-line comments to my initial review so that I know what changes I'm looking for in my re-review.

I'll be on leave (and offline) from today until October 9. The deadline for new packages to be accepted is October 18 (see https://www.bioconductor.org/developers/release-schedule/). I think the requested changes will be fairly straightforward for you implement and I will then be able to accept the package soon after returning from leave. But if while I'm away you do have any urgent concerns or questions, please post to this issue and tag @vjcitn or @lshep to bring it to their attention.

Cheers, Pete


Required

library(gg4way)
#> Loading required package: ggplot2

data("airwayFit")
airwayFit |>
    tidyDGE() |>
    gg4way(x = "N61311 vs N052611",
           y = "N061011 vs N052611",
           # Force all genes to be 'significant'.
           FDRcutoff = 1,
           logFCcutoff = 0)
#> Warning: Removed 1 rows containing missing values (`geom_label()`).

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.1 (2023-06-16) #> os macOS Ventura 13.6 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Australia/Melbourne #> date 2023-09-28 #> pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> abind 1.4-5 2016-07-21 [1] CRAN (R 4.3.0) #> Biobase 2.61.0 2023-04-25 [1] Bioconductor #> BiocGenerics 0.47.0 2023-04-25 [1] Bioconductor #> BiocParallel 1.35.4 2023-08-17 [1] Bioconductor #> bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.0) #> cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0) #> codetools 0.2-19 2023-02-01 [1] CRAN (R 4.3.1) #> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0) #> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.0) #> curl 5.0.2 2023-08-14 [1] CRAN (R 4.3.0) #> DelayedArray 0.27.10 2023-07-28 [1] Bioconductor #> DESeq2 1.41.11 2023-09-25 [1] Bioconductor #> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.0) #> dplyr 1.1.3 2023-09-03 [1] CRAN (R 4.3.0) #> evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.0) #> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.0) #> farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.0) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0) #> GenomeInfoDb 1.37.4 2023-09-07 [1] Bioconductor #> GenomeInfoDbData 1.2.10 2023-03-26 [1] Bioconductor #> GenomicRanges 1.53.1 2023-05-04 [1] Bioconductor #> gg4way * 0.99.0 2023-09-27 [1] Bioconductor #> ggplot2 * 3.4.3 2023-08-14 [1] CRAN (R 4.3.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0) #> gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1) #> htmltools 0.5.6 2023-08-10 [1] CRAN (R 4.3.0) #> IRanges 2.35.2 2023-06-23 [1] Bioconductor #> janitor 2.2.0 2023-02-02 [1] CRAN (R 4.3.0) #> knitr 1.44 2023-09-11 [1] CRAN (R 4.3.1) #> labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.0) #> lattice 0.21-8 2023-04-05 [1] CRAN (R 4.3.1) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0) #> limma 3.57.8 2023-09-24 [1] Bioconductor #> locfit 1.5-9.8 2023-06-11 [1] CRAN (R 4.3.0) #> lubridate 1.9.3 2023-09-27 [1] CRAN (R 4.3.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> Matrix 1.6-1.1 2023-09-18 [1] CRAN (R 4.3.1) #> MatrixGenerics 1.13.1 2023-06-09 [1] Github (Bioconductor/MatrixGenerics@8ecd537) #> matrixStats 1.0.0 2023-06-02 [1] CRAN (R 4.3.0) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0) #> Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.0) #> RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.3.0) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.0) #> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0) #> S4Arrays 1.1.6 2023-08-31 [1] Bioconductor #> S4Vectors 0.39.2 2023-09-22 [1] Bioconductor #> scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) #> snakecase 0.11.1 2023-08-27 [1] CRAN (R 4.3.0) #> SparseArray 1.1.12 2023-08-31 [1] Bioconductor #> statmod 1.5.0 2023-01-06 [1] CRAN (R 4.3.0) #> stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.0) #> stringr 1.5.0 2022-12-02 [1] CRAN (R 4.3.0) #> SummarizedExperiment 1.31.1 2023-05-01 [1] Bioconductor #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) #> tidyr 1.3.0 2023-01-24 [1] CRAN (R 4.3.0) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0) #> timechange 0.2.0 2023-01-11 [1] CRAN (R 4.3.0) #> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0) #> vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.3.0) #> withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1) #> xfun 0.40 2023-08-09 [1] CRAN (R 4.3.0) #> xml2 1.3.5 2023-07-06 [1] CRAN (R 4.3.0) #> XVector 0.41.1 2023-05-03 [1] Bioconductor #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0) #> zlibbioc 1.47.0 2023-04-25 [1] Bioconductor #> #> [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

Recommended

[^1]: "A commonly used approach is to apply FDR and logFC cutoffs simultaneously. However this tends to favor lowly expressed genes, and also fails to control the FDR correctly.". Instead, the authors developed limma::treat() and edgeR::glmTreat(); see their documentation for further details. DESeq2 may have similar functionality but I'm not as familiar with that package.

bioc-issue-bot commented 9 months ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 33f5d1d51207ee39e1e38d869ee1062283d9d8a0

bioc-issue-bot commented 9 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Build System.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Bioconductor Build System: macOS 12.6.5 Monterey: gg4way_0.99.1.tar.gz Linux (Ubuntu 22.04.2 LTS): gg4way_0.99.1.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/gg4way to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

ben-laufer commented 9 months ago

Hi @PeteHaitch,

Thanks for reviewing gg4way. All the required changes have been made. See below for a point-by-point response about the improvements.

The package is largely fine as is, but I did want to ask whether you considered implementing gg4way() as a generic with specific methods for MArrayLM, DESeqDataSet, and (a list of) DESeqResults objects, along with a default version for the general 'tidyList' format? The Glimma package does something similar (custom plots of DE analysis results) and uses the S3 object oriented programming system to implement this. This would slightly simplify the user-interface and obviate the need to export the tidyDGE() function. To be clear, this isn't a requirement for acceptance, but I thought it worth raising in case it is helpful to you.

gg4way() is now a generic with specific methods for: MArrayLM, list, and DESeqDataSet. The list method can recognize lists of DGELRT, DESeqResults, and data.frame objects. The tidyDGE() function has been removed.

Required

  • [x] The man page for gg4way needs a little fleshing out. Some thoughts/questions I had when reading it are listed below. Some of these can be addressed with small changes to the current text, but I think the addition of a 'Details' section would be helpful.

The man page for gg4way now clarifies these points.

  • [x] The format of tidyList must be explicitly documented, particularly since it's not a formal class.

The tidyList argument has been renamed to DGEdata. The argument now contains details about the supported objects. The details section of the documentation also shows the format for a data.frame.

  • [x] It could be made clearer that x and y must be the actual names of the results in the tidyList object rather than some names being added just for the purposes of axis labels on the plots.

This has been clarified in the argument descriptions.

  • [x] Why are both symbol and ID required by gg4way()? It seems only symbol is used in the plot (as the optional labels).

The "other packages" section of the vignette (and function documentation) mention that the symbol column is optional and used for the plot labels. It enables cases where not every feature has a (unique) gene ID. The ID column is how unique features are defined.

  • [x] Do you have any guidance on interpreting the correlation? For example, if the 2 contrasts 'overlap' (i.e. include a common factor), as they do in the vignette, then that would seem to induce a non-zero correlation whereas if the 2 contrasts do not overlap then I imagine the correlation does not have this 'bias'?

I have added a note about this to the details section of the man page for gg4way.

  • [x] What does "sums" mean in "textSize Numeric specifying size of text with sums" and this "textOffset Numeric specifying offset of text with sums"?

The documentation for the arguments now mention that this is referring to the DEG totals for each category of gene overlap, which is shown by the numeric text in the plot.

  • [x] What's actually plotted could be made more explicit. E.g., I assume it is the logFC from each comparison that's plotted on the x and y axes, but it's not actually stated.

Added to the description section of the man page and mentioned in the vignette.

  • [x] In the vignette, the numbers sometimes differ between the text and figures, e.g., "71 DEGs" in text" vs. "72"" in bottom-left of first figure. Perhaps you can include the in-text values via inline evaluated R code (i.e. using r) rather than hardcoding them.

In-text values are now obtained from inline R code.

Let me know if it's missing any key details.

  • [x] The numbers reported in the quadrants seem to incorrect if a user forces all genes to be significant. Does this indicate deeper problems with this calculations or just a pathological corner case?

This has been fixed and additional error checks have been added.

Recommended

  • [x] You might also add support for DGELRT objects from edgeR, the other major results format for DE analyses in Bioconductor.

Support for a list of DGELRT objects from edgeR has been added to gg4way() and an example is now provided in the vignette.

  • [x] That gg4way() can return a ggplot, numeric or tibble object depending on its arguments is a bit overcomplicated. Might it be better to have separate functions for the calculation of sharedOnly = TRUE and corOnly = TRUE results?

Three extractor functions have been added, each of which return a distinct object. They are: getCor(), getShared(), and getTotals(). gg4way() will now only return a ggplot.

  • [x] Any reason tidyList has to be a list of tibble objects rather than just data.frame objects?

The documentation now refers to a list of data.frame objects.

  • [ ] I'll note that the limma/edgeR authors explicitly recommend against applying a logFC cutoff to the results of a DE analysis1, so gg4way(logFCcutoff = 1) is perhaps a 'bad' default.

Support has been added for the output of the limma::treat() and edgeR::glmTreat() functions mentioned in the link as replacements for limma::eBayes() and edgeR::glmQLFTest(), respectively. The vignette now also mentions that gg4way() works with these and a similar approach in DESeq2. This should help with leaving the decision of the statistical approach up to the user. It also seems like the default of gg4way(logFCcutoff = 1) remains acceptable.

  • [x] Regarding the label argument of gg4way(), and this is a pathological case, but what if there was a gene symbol called "shared"? Perhaps use label = TRUE instead of label = "shared" to denote that option (since the default is shared = FALSE)?

label = TRUE now replaces label = "shared".

  • [x] Consider adding captions for figures in the vignette.

Added.

  • [ ] Perfectionist nitpicking: For the first figure in vignette, for example, that the %<-->% expression is not centered around x=0 (whereas it is for y=0) is slightly jarring. I'm not sure of a general solution to this, and so I apologise for even bringing it up, but I can't unsee it!

.tidyLabel() uses spacers to help with this, but it's only perfectly aligned when the limits of the axis are identical. The vignette details how a user can customize the axis titles through ggplot2 afterwards.

These links will be added upon acceptance.

The description file has been modified to generate a better default citation.

  • [x] Good job on including unit tests. Take a look at running covr::report() on your package for gaps in the test coverage.

An additional unit test has been added.

  • [x] Recommend running a spell check (if you use R Studio there is one built in). E.g., stop("Format not **suppourted**") in tidyDGE().

That one slipped through my R Studio spell check.

Footnotes

  1. "A commonly used approach is to apply FDR and logFC cutoffs simultaneously. However this tends to favor lowly expressed genes, and also fails to control the FDR correctly.". Instead, the authors developed limma::treat() and edgeR::glmTreat(); see their documentation for further details. DESeq2 may have similar functionality but I'm not as familiar with that package.
PeteHaitch commented 9 months ago

I'm happy to mark gg4way as accepted. Thank you for your engaging constructively in the review process, @ben-laufer, and for your contribution to Bioconductor!

A few final, minor things to address:

bioc-issue-bot commented 9 months ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.

lshep commented 9 months ago

The default branch of your GitHub repository has been added to Bioconductor's git repository as branch devel.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/ben-laufer.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("gg4way"). The package 'landing page' will be created at

https://bioconductor.org/packages/gg4way

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.