KonradZych commented 6 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Repository: https://github.com/KonradZych/SIAMCAT

Confirm the following by editing each check box to '[x]'

[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.

I am familiar with the essential aspects of Bioconductor software management, including:

[x] The 'devel' branch for new packages and features.
[x] The stable 'release' branch, made available every six months, for bug fixes.
[x] Bioconductor version control using Git (optionally via GitHub).

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 6 years ago

Hi @KonradZych

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: SIAMCAT
Type: Package
Title: Statistical Inference of Associations between Microbial Communities And host phenoTypes
Version: 0.99.0
Author: Georg Zeller [aut,cre], Jakob Wirbel[aut], Konrad Zych [aut], Nicolai Karcher[ctb] and Kersten Breuer[ctb]
Authors@R: c(person("Georg", "Zeller", role = c("aut", "cre"), email = "zeller@embl.de", comment = c(ORCID = "0000-0003-1429-7485")),
   person("Konrad", "Zych", role = c("aut", "cre"), email = "konrad.zych@embl.de", comment = c(ORCID = "0000-0001-7426-0516")),
   person("Jakob", "Wirbel", role = c("aut", "cre"), email = "jakob.wirbel@embl.de", comment = c(ORCID = "0000-0002-4073-3562"))
    )
Maintainer: Konrad Zych <konrad.zych@embl.de>, Georg Zeller <eller@embl.de>, Jakob Wirbel <jakob.wirbel@embl.de>
Description: Pipeline for Statistical Inference of Associations between Microbial Communities And host phenoTypes (SIAMCAT). A primary goal of analyzing microbiome data is to determine changes in community composition that are associated with environmental factors. In particular, linking human microbiome composition to host phenotypes such as diseases has become an area of intense research. For this, robust statistical modeling and biomarker extraction toolkits are crucially needed. SIAMCAT provides a full pipeline supporting data preprocessing, statistical association testing, statistical modeling (LASSO logistic regression) including tools for evaluation and interpretation of these models (such as cross validation, parameter selection, ROC analysis and diagnostic model plots).
Depends:
    R (>= 3.4.0),
    mlr,
    phyloseq
Imports:
    beanplot,
    glmnet,
    graphics,
    grDevices,
    grid,
    gridBase,
    gridExtra,
    LiblineaR,
    ParamHelpers,
    pROC,
    PRROC,
    RColorBrewer,
    stats,
    utils
License: GPL-3
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1
biocViews: Metagenomics, Classification, Microbiome, Sequencing, Preprocessing, Clustering, FeatureExtraction
Suggests:
    BiocStyle,
    optparse,
    testthat,
    knitr,
    rmarkdown
VignetteBuilder: knitr

bioc-issue-bot commented 6 years ago

A reviewer has been assigned to your package Learn What to Expect during the review process.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

4f1c9d5 removing errors caused by faulty examples

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

74815a6 compressing data file

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

8b9754e updating R version dependency

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

nturaga commented 6 years ago

SIAMCAT package review

R CMD build

ok

R CMD check

ok

DESCRIPTION

Just use Authors@R, there is no need of Author and Maintainer.
You can have only 1 maintainer. Check the build report.

* checking DESCRIPTION meta-information ... NOTE
Authors@R field gives more than one person with maintainer role:
  Georg Zeller <zeller@embl.de> [aut, cre] (<https://orcid.org/0000-0003-1429-7485>)
  Konrad Zych <konrad.zych@embl.de> [aut, cre] (<https://orcid.org/0000-0001-7426-0516>)
  Jakob Wirbel <jakob.wirbel@embl.de> [aut, cre] (<https://orcid.org/0000-0002-4073-3562>)

NAMESPACE

ok

man

Good.

vignette

Good.

R

There are a lot of NOTE's in the build report that are easy to fix.

* checking data for non-ASCII characters ... NOTE
  Note: found 4 marked UTF-8 strings

* Checking for recommended biocViews...
    * NOTE: Consider adding these automatically suggested biocViews:
      GeneticVariability, MultipleComparison

Use vapply instead of sapply and use seq_along and seq_len instead of 1:length. The build report points out these mistakes. These coding practices are important.

* Checking coding practice...
    * NOTE: Avoid sapply(); use vapply() found in files:
    check_associations.r (line 340, column 19)
    check_associations.r (line 341, column 19)
    check_associations.r (line 490, column 3)
    check_associations.r (line 526, column 15)
    check_associations.r (line 531, column 13)
    model_interpretation_plot.R (line 205, column 22)
    plm_utils.r (line 96, column 19)
    siamcat_class_methods.R (line 46, column 32)
    siamcat_class_methods.R (line 51, column 26)
    * NOTE: Avoid 1:...; use seq_len() or seq_along() found in files:
    check_associations.r (line 87, column 22)
    check_associations.r (line 191, column 13)
    check_associations.r (line 226, column 13)
    check_associations.r (line 262, column 33)
    check_associations.r (line 278, column 17)
    check_associations.r (line 278, column 44)
    check_associations.r (line 279, column 18)
    check_associations.r (line 279, column 49)
    check_associations.r (line 282, column 21)
    check_associations.r (line 282, column 47)
    check_associations.r (line 283, column 21)
    check_associations.r (line 283, column 47)
    check_associations.r (line 284, column 24)
    check_associations.r (line 284, column 45)
    check_associations.r (line 285, column 25)
    check_associations.r (line 285, column 50)
    check_associations.r (line 286, column 24)
    check_associations.r (line 286, column 45)
    check_associations.r (line 287, column 25)
    check_associations.r (line 287, column 50)
    check_associations.r (line 290, column 24)
    check_associations.r (line 290, column 45)
    check_associations.r (line 291, column 25)
    check_associations.r (line 291, column 50)
    check_associations.r (line 294, column 13)
    check_associations.r (line 329, column 32)
    check_associations.r (line 342, column 13)
    check_associations.r (line 398, column 13)
    check_associations.r (line 540, column 41)
    check_associations.r (line 544, column 13)
    check_confounders.r (line 65, column 13)
    check_confounders.r (line 84, column 17)
    check_confounders.r (line 93, column 17)
    create_data_split.r (line 99, column 13)
    create_data_split.r (line 111, column 15)
    create_data_split.r (line 162, column 15)
    create_data_split.r (line 164, column 33)
    create_data_split.r (line 174, column 33)
    create_data_split.r (line 175, column 17)
    create_data_split.r (line 181, column 33)
    create_data_split.r (line 190, column 15)
    create_data_split.r (line 194, column 15)
    evaluate_predictions.r (line 69, column 15)
    evaluate_predictions.r (line 88, column 15)
    evaluate_predictions.r (line 145, column 15)
    evaluate_predictions.r (line 158, column 15)
    evaluate_predictions.r (line 159, column 17)
    filter.features.r (line 84, column 15)
    filter.features.r (line 88, column 38)
    make_predictions.r (line 58, column 69)
    make_predictions.r (line 61, column 15)
    make_predictions.r (line 62, column 17)
    make_predictions.r (line 113, column 72)
    make_predictions.r (line 115, column 15)
    model_evaluation_plot.r (line 38, column 15)
    model_evaluation_plot.r (line 75, column 15)
    model_interpretation_plot.R (line 146, column 13)
    model_interpretation_plot.R (line 270, column 17)
    model_interpretation_plot.R (line 273, column 15)
    model_interpretation_plot.R (line 305, column 17)
    model_interpretation_plot.R (line 323, column 15)
    model_interpretation_plot.R (line 340, column 13)
    model_interpretation_plot.R (line 346, column 13)
    model_interpretation_plot.R (line 381, column 15)
    model_interpretation_plot.R (line 388, column 15)
    model_interpretation_plot.R (line 502, column 33)
    plm_utils.r (line 19, column 55)
    plm_utils.r (line 116, column 15)
    siamcat_class_methods.R (line 38, column 16)
    train_model.r (line 102, column 16)
    train_model.r (line 106, column 24)

Format you code better, with formatR, and make the code more readable. Use 4 spaces instead of 2. The current formatting makes it hard to read your code. The 80 char limit needs to be reasonably followed as well. Important
Use message instead of cat for printing to stdout.

add_meta_pred.R

  if(verbose>1) cat("+ starting add.meta.pred\n")

Do this as needed everywhere.

Use getter's and setter's for accessing slots in your class, instead of using the @ symbol.

add_meta_pred.R, example.

      siamcat@phyloseq@otu_table <- otu_table(
        rbind(siamcat@phyloseq@otu_table, m),taxa_are_rows=TRUE)
      rownames(siamcat@phyloseq@otu_table)[nrow(
        siamcat@phyloseq@otu_table)] <- paste('META_', toupper(p), sep='')

phyloseq already has accessors, eg: otu_table(phyloseq_object) use these.Do this everywhere.

If you are copy pasting code, it can be made into a small helper function. There are many such examples throughout you codebase, these all need to be fixed.

check_associations.r

  x.q = apply(data1, 1, function (x)quantile(x, c(0.05, 0.25, 0.5, 0.75, 0.95),
                                              na.rm=TRUE, names=FALSE))
  y.q = apply(data2, 1, function (x)quantile(x, c(0.05, 0.25, 0.5, 0.75, 0.95),
                                              na.rm=TRUE, names=FALSE))

eg: modify your function to take in a quantiles vector too, and you can reuse it accross your entire package.

apply_quantile <- function(data, quantiles_vector) {
    apply(data, 1, 
          function (x) quantile(x, quantiles_vector,
              na.rm = TRUE, names = FALSE))
}

Notice how I indented my code, and formatted it to fit within 80 chars.

Take advantage of vectorized functions, like rowMedians and colMedians from the matrixStats package (https://www.rdocumentation.org/packages/matrixStats/versions/0.53.0/topics/rowMedians).

x.medians = apply(data1, 1, function (x) median(x))

y.medians = apply(data2, 1, function (x) median(x))

Also, take a look at rowQuantiles and colQuantiles from matrixStats. Avoid apply functions and loops and opt to vectorize.

Do this across you codebase.

Remove commented out code.
Lots of nested for loops (evaluate_predictions.R, make_predictions.R)

As an example,

    for (c in 1:ncol(predictions)) {
      for (r in 1:length(t)) {
        tp[r,c] = sum(test.label==label@positive.lab  & predictions[,c] > thr[r])
        fp[r,c] = sum(test.label==label@negative.lab  & predictions[,c] > thr[r])
        tn[r,c] = sum(test.label==label@negative.lab  & predictions[,c] < thr[r])
        fn[r,c] = sum(test.label==label@positive.lab  & predictions[,c] < thr[r])
      }
    }

the code chunk , predictions[, c] > thr[r] is comparing a column of numeric value to one value. This is a vectorized function. Similarly, test.label==label@positive.lab can be done outside the iteration as a logical comparison. This nested for loop seems necessary, and can be refactored into vectorized code.

Please correct the changes in this review, before we move to a second review.

KonradZych commented 6 years ago

Thank you for the review (and it seems it did cost you quite a bit of time!). We are on it and will report back soon with the changes.

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

dc04367 Bioconductor review 1

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

b1d089c bugfix in R/

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

6680f13 corrected R dependency

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

nturaga commented 6 years ago

Hi @KonradZych

Please let me know before your next review, adding a response about what you corrected in the code.

KonradZych commented 6 years ago

Hi @nturaga,

please see below:

DESCRIPTION

Just use Authors@R, there is no need of Author and Maintainer. -> DONE
You can have only 1 maintainer. Check the build report. -> DONE

R

Removed NOTEs corresponding to UTF-8 strings and added recommended biocViews
Changed vapply to sapply and 1:length to seq_along and seq_len
All lines are < 80 chars and indents are 4 spaces
We now use message instead of cat for printing to stdout.
We got rid of the @ and are using correct accessors and assignment methods
Repeated code was moved to small helper functions
We are using vectorized functions from matrixStats instead of apply whenever possible
Commented out code was removed
For loop nesting was eliminated

KonradZych commented 6 years ago

Tagging in @jakob-wirbel

nturaga commented 6 years ago

Good job @KonradZych. The package looks in good shape now.

Just a last comment, your vignette is an executable for some reason. This should not be the case, you can just chmod -x <vignette> on it once and push to your github.

bioc-issue-bot commented 6 years ago

Your package has been accepted. It will be added to the Bioconductor Git repository and nightly builds. Additional information will be posed to this issue in the next several days.

Thank you for contributing to Bioconductor!

mtmorgan commented 6 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/KonradZych.keys is not empty), then no further steps are required. Otherwise, do the following:

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using biocLite(\"SIAMCAT\"). The package 'landing page' will be created at

https://bioconductor.org/packages/SIAMCAT

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.

Bioconductor / Contributions

SIAMCAT #698

SIAMCAT package review

R CMD build

R CMD check

DESCRIPTION

NAMESPACE

man

vignette

R

DESCRIPTION

R