Gene Expression Variation Analysis (GEVA)

nunesijg commented 3 years ago

Repository: https://github.com/sbcblab/geva
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
[x] I am familiar with the Bioconductor code of conduct and agree to abide by it.

I am familiar with the essential aspects of Bioconductor software management, including:

[x] The 'devel' branch for new packages and features.
[x] The stable 'release' branch, made available every six months, for bug fixes.
[x] Bioconductor version control using Git (optionally via GitHub).

bioc-issue-bot commented 3 years ago

Hi @nunesijg

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: geva
Type: Package
Title: Gene Expression Variation Analysis (GEVA)
Description: Statistic methods to evaluate variation between multiple biological conditions.
Version: 0.99.0
Authors@R: 
    c(person(given = "Itamar José",
   family = "Guimarães Nunes",
   role = c("aut", "cre"),
   email = "nunesijg@gmail.com",
   comment = c(ORCID = "0000-0002-6246-4658")),
      person(given = "Murilo",
   family = "Zanini David",
   email = "zanini.murilo@gmail.com",
   role = "ctb"),
      person(given = "Bruno",
   family = "César Feltes",
   email = "bcfeltes@gmail.com",
   role = "ctb",
   comment = c(ORCID = "0000-0002-2825-8295")),
      person(given = "Marcio",
   family = "Dorn",
   email = "mdorn@inf.ufrgs.br",
   role = "ctb",
   comment = c(ORCID = "0000-0001-8534-3480")))
URL: https://github.com/sbcblab/geva
BiocType: Software
biocViews: Classification, DifferentialExpression, GeneExpression, Microarray,
    MultipleComparison, RNASeq, SystemsBiology, Transcriptomics
License: LGPL (>=3)
Encoding: UTF-8
Depends: R (>= 3.6.0)
Imports: grDevices, graphics, methods, stats, utils, dbscan, fastcluster, matrixStats
Suggests: devtools, knitr, rmarkdown, roxygen2, limma, topGO
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
VignetteBuilder: knitr
LazyData: true
Collate: 
    'stringhelpers.R'
    'callhelpers.R'
    'vectorhelpers.R'
    'printhelpers.R'
    'asserts.R'
    'usecasechecks.R'
    'plotting.R'
    'dochelpers.R'
    'generics.R'
    'classhelpers.R'
    'c_SVTable.R'
    'c_GEVAGroupSet.R'
    'c_GEVACluster.R'
    'funhelpers.R'
    'linq.R'
    'c_TypedList.R'
    'c_SVAttribute.R'
    'c_GEVAInput.R'
    'c_GEVASummary.R'
    'c_GEVAGroupedSummary.R'
    'c_GEVAQuantiles.R'
    'c_GEVAQuantilesAdjusted.R'
    'c_GEVAResults.R'
    'summarization.R'
    'clusteringbase.R'
    'matrixhelpers.R'
    'statmath.R'
    'dclustering.R'
    'quantiles.R'
    'factoring.R'
    'scoremerge.R'
    'finalize.R'
    'geva-package.R'
    'hclustering.R'
    'idealtesting.R'
    'input.R'

bioc-issue-bot commented 3 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/geva to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 6641aed39c0cf6b4c06b8428f03a82e075e485bb

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/geva to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 09678ce2ed5b0469762bfc860276c73eb75c163a

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, skipped, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/geva to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

nunesijg commented 3 years ago

After analyzing the last error in Ubuntu:

Warning in system2(..., stdout = if (use_file_stdout()) f1 else FALSE, stderr = f2) :
  error in running command
! sh: 1: xelatex: not found

It seems that your system has no xelatex command, which is expected to be installed in any LaTeX (texlive) installation. Xelatex is used to compile the UserManual.rmd file for the vignettes, and it is specified by the latex_engine: xelatex header argument in the Rmd file. In this sense, am I expected to use another LaTeX engine? I also changed it to pdflatex in my local tests, but it outputs other bugs (reported by other users as well). What should I do to correctly compile the Rmd in your Ubuntu system? If possible, I suggest installing it using texlive-xetex. If not possible, please refer to me an alternative.

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 685696af3c77273d27e0244ae88c6ab1c0f5726c

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/geva to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

nunesijg commented 3 years ago

The warnings only occur in Windows due to special characters in the assign methods. Therefore, methods such as classification.table<-,GEVAGroupSet,data.frame-method are being built as a file named classification.table<-,GEVAGroupSet,data.frame-method.html, which is invalid as a Windows file path (the '<' character is not allowed).

This issue is not particularly related to my package but is due to an apparent bug in the build process. After searching in Google, I found that the same issue occurs on other builds (see here, here, and here). This unsolved StackOverflow post points the same problem. In addition, this thread quotes that the problem usually occurs in the R-devel and there's not much that we can do to solve this problem as a package developer.

In this sense, since there's no trivial solution available to document the setter methods without the warnings, could you please ignore them and proceed with the package review?

lshep commented 3 years ago

Yes I will proceed with the review.

lshep commented 3 years ago

Thank you for your submission. Please see the following review:

build report

[ ] Please remove empty man page sections indicated on build report.
[ ] Please use donttest{} instead of dontrun{} as donttest requires valid R code
[ ] We strongly suggest unit tests.

DESCRIPTION

[ ] Please remove LazyData: true. We have rarely found this useful and it can acutally slow the installation process for a package.
[ ] Is the proper license tag is LGPL-3.0 as there is not a current version greater than?

vignette

[ ] The vignette directory should only contain vignettes for the package. Please move the README.md and associated files. Generally additional documentation is placed in a inst/doc directory.
[ ] Optional but strongly recommended. We recommend the name of the vignette be the same name as the package. Having a generically titled vignette (like UserManual) could potentially pose naming conflicts and depending what other packages users have installed may incorrectly load a different packages vignette rather than the intended.
[ ] When I am running the code in the vignettes, I am getting different plotting results? Why? It looks like you hid a set.seed is this important for reproducibility?
[ ] When I run your code I get warning messages such as the following suggesting that your code might not run properly on all devices

> gresults <- geva.finalize(gsummary, gquants, gcluster)
Merging scores...
Searching for dependent factors...
Searching for specific factors...
gresults <- geva.Found 322 significant genes:
similar: 79
factor-dependent: 135
factor-specific: 108
Warning messages:
1: semi-transparency is not supported on this device: reported only once per page 
2: semi-transparency is not supported on this device: reported only once per page 
3: semi-transparency is not supported on this device: reported only once per page 
4: semi-transparency is not supported on this device: reported only once per page 
5: semi-transparency is not supported on this device: reported only once per page 
6: semi-transparency is not supported on this device: reported only once per page 
7: semi-transparency is not supported on this device: reported only once per page 
8: semi-transparency is not supported on this device: reported only once per page 
9: semi-transparency is not supported on this device: reported only once per page 
10: semi-transparency is not supported on this device: reported only once per page

[ ] It would be nice to see an example using a standard Bioconductor object output from DESeq2 (preferred) or limma and how it would be incorporated into your process.

R code

[ ] Ideally you would've reused existing classes or extended a relevant class rather than inventing a new classes. Any data that is a n by n format should be able to use or extend the use of SummarizedExperiment objects. Given the amount of work and in-depth implementation of your classes we can let this slide this time but if you develop in the future we may not always allow this. However: Instead of users having to convert limma or DESeq2 objects (extends SummarizedExperiment), your functions should be able to take relevant objects and do the conversion in your code. In other words, there should be a function that could take SummarizedExperiment and a function that could take a limma object to create the GEVAInput object.
[ ] Please remove commented out code. Comments describing code are encouraged but commented out code should be removed. (at least in callhelpers.R but should be applied to all files)

Please address the above issues. When ready please do a version bump to trigger a new build and comment back here that the package is ready for re-review.

Cheers

nunesijg commented 3 years ago

Thank you for the thorough review! My comments regarding the topics that you brought are listed below:

build report

Please remove empty man page sections indicated on build report. No problem. They were generated automatically, so I'll do it manually.
Please use donttest{} instead of dontrun{} as donttest requires valid R code According to the Rd vignette documentation (link), examples enclosed by donttest{} are run by calling the examples function. In this sense, the examples that I enclosed by dontrun{} show hypothetical use cases with input text files and cannot be tested. Moreover, other examples in the same documentation page output the same object and can be tested using examples. To make the examples inside dontrun{} work with donttest{}, I would have to append many example text files, thereby consuming unnecessary disk space with the package. I opted to make the package clean by not adding complementary TXT files, as they are not the only way to achieve the same input.
We strongly suggest unit tests. I've done a lot of tests with at least 30 datasets (each one had multiple conditions) using all the parameters included in the package. The results will be included in our manuscript, which is expected to be submitted at the end of the month. I apologize for not being able to show them nor publish them immediately.

DESCRIPTION

Please remove LazyData: true. We have rarely found this useful and it can acutally slow the installation process for a package. Okay, I'll remove it.
Is the proper license tag is LGPL-3.0 as there is not a current version greater than? I put >= just in case, since it was in the DESCRIPTION file of another package (or perhaps it was some tutorial). The correct is, in fact, LGPL version 3, so I'll change it as you suggested.

vignette

The vignette directory should only contain vignettes for the package. Please move the README.md and associated files. Generally additional documentation is placed in a inst/doc directory. Okay. I'll move the README.md to the main GitHub directory, since the format is mostly used in GitHub. I'll ask just in case: are the PDF files supposed to be in vignettes, or inst/doc directory? If it's the latter: is it possible to program roxygen2::roxygenize() (the function that I used to create the vignettes) to automatically output the vignette files in the inst/doc directory?
Optional but strongly recommended. We recommend the name of the vignette be the same name as the package. Having a generically titled vignette (like UserManual) could potentially pose naming conflicts and depending what other packages users have installed may incorrectly load a different packages vignette rather than the intended. No problem. I just followed the pattern from the currently published packages from Bioconductor (e.g., limma has the usersguide.pdf file). I'll change the vignette files to geva.Rmd and geva.pdf.
When I am running the code in the vignettes, I am getting different plotting results? Why? It looks like you hid a set.seed is this important for reproducibility? Yes, I hid a set.seed(1) specifically in the vignettes as an attempt to reproduce the same documentation, which uses random generation functions such as runif to generate the plots and results (see geva.ideal.examples function). Should I remove it? If I do so, every documentation will output a different plot and a different result, and the final example tables may be particularly strange and probably useless (i.e., the columns will not have the meaningful classifications that I reproduced with set.seed(1)).
When I run your code I get warning messages such as the following suggesting that your code might not run properly on all devices Thank you for pointing the issue, it's new for me. From which OS are you testing? I'm using Windows now and can't manage to reproduce these warnings (I also have access to Ubuntu, but not to a Mac). I'm almost sure that it's related to the plots, as some of them use transparency. However, it'd be strange since semi-transparency is fundamental to any plotting that uses multiple layers, unless if it's effectively a bug in the R's base/graphics package.
It would be nice to see an example using a standard Bioconductor object output from DESeq2 (preferred) or limma and how it would be incorporated into your process. The example with limma output uses MArrayLM objects, which is the most common output from differential expression (DE) using the limma workflow. I don't have access to RNA-seq results at the moment, and think it'd be a problem to append gigabytes of RNA-seq data just to provide an example. Nevertheless, the package works with any table (matrix or data.frame) containing DE analysis results (i.e., multiple tables with logFC and p-value columns) and does not depend on a specific platform or experimental type, be it microarrays, RNA-seq or Methylomes. However, if you have any suggestion of specific classes for these examples, please let me know.

R Code

Ideally you would've reused existing classes or extended a relevant class rather than inventing a new classes. Any data that is a n by n format should be able to use or extend the use of SummarizedExperiment objects. I understand your point, and I would do it if it was the case of reusing other classes. However, this package presents a new method with a different workflow, which is particularly applied after the DE analysis. I tried to adapt to other available classes from Bioconductor, but unfortunately I couldn't do it for the current context. For example, the SummarizedExperiment class stores a single matrix intended for samples; whereas the GEVAInput class stores two matrices with equivalent dimensions (one for logFC values and one for weights), so here the columns represent DE results (i.e., not samples). They have different constraints and are used for different purposes. There's no suitable class to replace GEVAInput -- which is expected, since it's a new methodology. More details regarding the method itself (not the code) are present in our manuscript, which will be submitted soon.
Instead of users having to convert limma or DESeq2 objects (extends SummarizedExperiment), your functions should be able to take relevant objects and do the conversion in your code. In this case, no conversion is done because it uses the output from DE analysis, which is conventionally a data.frame. More precisely, the input for geva.merge.input must be multiple data.frame objects (matrices are also accepted), so I would say that it's more a concatenation than a conversion. The limma and DESeq2 packages do not output SummarizedExperiment in the DE workflow neither import this class, but even it if did, the geva.merge.input function would still require several SummarizedExperiment objects to make a single GEVAInput since these classes store different data and are used for different purposes. The current implementation accepts the proper data.frame outputs from limma (with topTable) and DESeq2 (with DESeqResults) without any conversion.
Please remove commented out code. Comments describing code are encouraged but commented out code should be removed. Understood, thank you for pointing it. If possible, however, please spare the commented code in the Rmd file header. I use that to toggle between generating the PDF vignette and the README.md for the main GitHub page.

Thank you very much for pointing all those issues. I'll push the corrections after finishing the changes. Best regards

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: c2a38e86158baf8fc045d030a0f61e2b54947300

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/geva to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

nunesijg commented 3 years ago

Hi Lori, Regarding the ERROR tag from the Mojave build: while I don't have access to a Mac OS to reproduce this issue, the error message suggests that it's a package dependence error. It seems that the Mac version uses the RBGL package, although I never used this package nor included it in the code, so it may be some kind of bug in R (perhaps from graphics). Moreover, the error wasn't present in the previous build even though I didn't make any changes in the graphics part of the code. Therefore, if possible, please consider this exceptional case.

For the other review considerations, please see my previous answer above. Best regards

lshep commented 3 years ago

Thank you. I will continue with the review and have another look shortly.

lshep commented 3 years ago

Thank you for the detailed explanations. Very few minor comments and responses but I feel the package is in a good place for acceptance:

Unit Tests

[ ] Showing extensive testing in the manuscript will be great. Perhaps including a small subset would be beneficial? We encourage writing at least some basic functionality test directly into the package code as R will run them every time the package is checked (and in Bioconductor this means nightly) so if there is ever a change in a dependency or underlying code that your package relies on, they could quickly and efficiently indicate any problems or concerns that should be addressed. Obviously adding all the extensive testing would not be a good idea for time and efficiency either; but some basic testing would still be encouraged.

vignette

[x] The vignettes should never be generated and stored manually; these types of static vignettes are generally not allowed/recommended in Bioconductor. The vignettes are built interacively on every build/check of the package so a stored compiled version in the package is not necessary.
[ ] If there are random elements that are utilized in the code this should be mentioned and exposed to the user. I would recommend showing the set.seed(1) in the vignette with a quick mention of why its needed or how results may vary.
[x] It seem like the transparency issue is known for plotting and Ubuntu. We can ignore for now but be aware your plots may or may not display correctly for these users.

Thank you. Cheers,

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: e0ed0aa243e1ddde676b1b9284fbb67916a84e49

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/geva to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

nunesijg commented 3 years ago

Hi Lori,

I made the changes that you suggested. In summary:

Added unit tests, in this case, using testthat package. I created some test scripts in the tests/testthat directory including all analysis parameters and the most risky bits of the code;
Unhid the set.seed(1) in the vignettes and exposed it as example code preceded by a comment about the reproducibility.

You were right about the unit tests: they helped me to spot some bugs associated to method calls that haven't appeared before in any of my previous tests. All these bugs were fixed in this last commit as well.

Once again, thank you for your careful review and valueable suggestions. Best regards

lshep commented 3 years ago

Thanks for the updates! Looks good.

bioc-issue-bot commented 3 years ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

vjcitn commented 3 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/nunesijg.keys is not empty), then no further steps are required. Otherwise, do the following:

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("geva"). The package 'landing page' will be created at

https://bioconductor.org/packages/geva

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.

Bioconductor / Contributions

Gene Expression Variation Analysis (GEVA) #1872