Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

SIAMCAT #698

Closed KonradZych closed 6 years ago

KonradZych commented 6 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 6 years ago

Hi @KonradZych

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: SIAMCAT
Type: Package
Title: Statistical Inference of Associations between Microbial Communities And host phenoTypes
Version: 0.99.0
Author: Georg Zeller [aut,cre], Jakob Wirbel[aut], Konrad Zych [aut], Nicolai Karcher[ctb] and Kersten Breuer[ctb]
Authors@R: c(person("Georg", "Zeller", role = c("aut", "cre"), email = "zeller@embl.de", comment = c(ORCID = "0000-0003-1429-7485")),
   person("Konrad", "Zych", role = c("aut", "cre"), email = "konrad.zych@embl.de", comment = c(ORCID = "0000-0001-7426-0516")),
   person("Jakob", "Wirbel", role = c("aut", "cre"), email = "jakob.wirbel@embl.de", comment = c(ORCID = "0000-0002-4073-3562"))
    )
Maintainer: Konrad Zych <konrad.zych@embl.de>, Georg Zeller <eller@embl.de>, Jakob Wirbel <jakob.wirbel@embl.de>
Description: Pipeline for Statistical Inference of Associations between Microbial Communities And host phenoTypes (SIAMCAT). A primary goal of analyzing microbiome data is to determine changes in community composition that are associated with environmental factors. In particular, linking human microbiome composition to host phenotypes such as diseases has become an area of intense research. For this, robust statistical modeling and biomarker extraction toolkits are crucially needed. SIAMCAT provides a full pipeline supporting data preprocessing, statistical association testing, statistical modeling (LASSO logistic regression) including tools for evaluation and interpretation of these models (such as cross validation, parameter selection, ROC analysis and diagnostic model plots).
Depends:
    R (>= 3.4.0),
    mlr,
    phyloseq
Imports:
    beanplot,
    glmnet,
    graphics,
    grDevices,
    grid,
    gridBase,
    gridExtra,
    LiblineaR,
    ParamHelpers,
    pROC,
    PRROC,
    RColorBrewer,
    stats,
    utils
License: GPL-3
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1
biocViews: Metagenomics, Classification, Microbiome, Sequencing, Preprocessing, Clustering, FeatureExtraction
Suggests:
    BiocStyle,
    optparse,
    testthat,
    knitr,
    rmarkdown
VignetteBuilder: knitr
bioc-issue-bot commented 6 years ago

A reviewer has been assigned to your package Learn What to Expect during the review process.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

4f1c9d5 removing errors caused by faulty examples

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

74815a6 compressing data file

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

8b9754e updating R version dependency

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

nturaga commented 6 years ago

SIAMCAT package review

R CMD build

ok

R CMD check

ok

DESCRIPTION

* checking DESCRIPTION meta-information ... NOTE
Authors@R field gives more than one person with maintainer role:
  Georg Zeller <zeller@embl.de> [aut, cre] (<https://orcid.org/0000-0003-1429-7485>)
  Konrad Zych <konrad.zych@embl.de> [aut, cre] (<https://orcid.org/0000-0001-7426-0516>)
  Jakob Wirbel <jakob.wirbel@embl.de> [aut, cre] (<https://orcid.org/0000-0002-4073-3562>)

NAMESPACE

man

vignette

R

* checking data for non-ASCII characters ... NOTE
  Note: found 4 marked UTF-8 strings

* Checking for recommended biocViews...
    * NOTE: Consider adding these automatically suggested biocViews:
      GeneticVariability, MultipleComparison
* Checking coding practice...
    * NOTE: Avoid sapply(); use vapply() found in files:
    check_associations.r (line 340, column 19)
    check_associations.r (line 341, column 19)
    check_associations.r (line 490, column 3)
    check_associations.r (line 526, column 15)
    check_associations.r (line 531, column 13)
    model_interpretation_plot.R (line 205, column 22)
    plm_utils.r (line 96, column 19)
    siamcat_class_methods.R (line 46, column 32)
    siamcat_class_methods.R (line 51, column 26)
    * NOTE: Avoid 1:...; use seq_len() or seq_along() found in files:
    check_associations.r (line 87, column 22)
    check_associations.r (line 191, column 13)
    check_associations.r (line 226, column 13)
    check_associations.r (line 262, column 33)
    check_associations.r (line 278, column 17)
    check_associations.r (line 278, column 44)
    check_associations.r (line 279, column 18)
    check_associations.r (line 279, column 49)
    check_associations.r (line 282, column 21)
    check_associations.r (line 282, column 47)
    check_associations.r (line 283, column 21)
    check_associations.r (line 283, column 47)
    check_associations.r (line 284, column 24)
    check_associations.r (line 284, column 45)
    check_associations.r (line 285, column 25)
    check_associations.r (line 285, column 50)
    check_associations.r (line 286, column 24)
    check_associations.r (line 286, column 45)
    check_associations.r (line 287, column 25)
    check_associations.r (line 287, column 50)
    check_associations.r (line 290, column 24)
    check_associations.r (line 290, column 45)
    check_associations.r (line 291, column 25)
    check_associations.r (line 291, column 50)
    check_associations.r (line 294, column 13)
    check_associations.r (line 329, column 32)
    check_associations.r (line 342, column 13)
    check_associations.r (line 398, column 13)
    check_associations.r (line 540, column 41)
    check_associations.r (line 544, column 13)
    check_confounders.r (line 65, column 13)
    check_confounders.r (line 84, column 17)
    check_confounders.r (line 93, column 17)
    create_data_split.r (line 99, column 13)
    create_data_split.r (line 111, column 15)
    create_data_split.r (line 162, column 15)
    create_data_split.r (line 164, column 33)
    create_data_split.r (line 174, column 33)
    create_data_split.r (line 175, column 17)
    create_data_split.r (line 181, column 33)
    create_data_split.r (line 190, column 15)
    create_data_split.r (line 194, column 15)
    evaluate_predictions.r (line 69, column 15)
    evaluate_predictions.r (line 88, column 15)
    evaluate_predictions.r (line 145, column 15)
    evaluate_predictions.r (line 158, column 15)
    evaluate_predictions.r (line 159, column 17)
    filter.features.r (line 84, column 15)
    filter.features.r (line 88, column 38)
    make_predictions.r (line 58, column 69)
    make_predictions.r (line 61, column 15)
    make_predictions.r (line 62, column 17)
    make_predictions.r (line 113, column 72)
    make_predictions.r (line 115, column 15)
    model_evaluation_plot.r (line 38, column 15)
    model_evaluation_plot.r (line 75, column 15)
    model_interpretation_plot.R (line 146, column 13)
    model_interpretation_plot.R (line 270, column 17)
    model_interpretation_plot.R (line 273, column 15)
    model_interpretation_plot.R (line 305, column 17)
    model_interpretation_plot.R (line 323, column 15)
    model_interpretation_plot.R (line 340, column 13)
    model_interpretation_plot.R (line 346, column 13)
    model_interpretation_plot.R (line 381, column 15)
    model_interpretation_plot.R (line 388, column 15)
    model_interpretation_plot.R (line 502, column 33)
    plm_utils.r (line 19, column 55)
    plm_utils.r (line 116, column 15)
    siamcat_class_methods.R (line 38, column 16)
    train_model.r (line 102, column 16)
    train_model.r (line 106, column 24)

add_meta_pred.R

  if(verbose>1) cat("+ starting add.meta.pred\n")

Do this as needed everywhere.

add_meta_pred.R, example.

      siamcat@phyloseq@otu_table <- otu_table(
        rbind(siamcat@phyloseq@otu_table, m),taxa_are_rows=TRUE)
      rownames(siamcat@phyloseq@otu_table)[nrow(
        siamcat@phyloseq@otu_table)] <- paste('META_', toupper(p), sep='')

phyloseq already has accessors, eg: otu_table(phyloseq_object) use these.Do this everywhere.

check_associations.r

  x.q = apply(data1, 1, function (x)quantile(x, c(0.05, 0.25, 0.5, 0.75, 0.95),
                                              na.rm=TRUE, names=FALSE))
  y.q = apply(data2, 1, function (x)quantile(x, c(0.05, 0.25, 0.5, 0.75, 0.95),
                                              na.rm=TRUE, names=FALSE))

eg: modify your function to take in a quantiles vector too, and you can reuse it accross your entire package.

apply_quantile <- function(data, quantiles_vector) {
    apply(data, 1, 
          function (x) quantile(x, quantiles_vector,
              na.rm = TRUE, names = FALSE))
}

Notice how I indented my code, and formatted it to fit within 80 chars.

x.medians = apply(data1, 1, function (x) median(x))

y.medians = apply(data2, 1, function (x) median(x))

Also, take a look at rowQuantiles and colQuantiles from matrixStats. Avoid apply functions and loops and opt to vectorize.

Do this across you codebase.

As an example,

    for (c in 1:ncol(predictions)) {
      for (r in 1:length(t)) {
        tp[r,c] = sum(test.label==label@positive.lab  & predictions[,c] > thr[r])
        fp[r,c] = sum(test.label==label@negative.lab  & predictions[,c] > thr[r])
        tn[r,c] = sum(test.label==label@negative.lab  & predictions[,c] < thr[r])
        fn[r,c] = sum(test.label==label@positive.lab  & predictions[,c] < thr[r])
      }
    }

the code chunk , predictions[, c] > thr[r] is comparing a column of numeric value to one value. This is a vectorized function. Similarly, test.label==label@positive.lab can be done outside the iteration as a logical comparison. This nested for loop seems necessary, and can be refactored into vectorized code.

Please correct the changes in this review, before we move to a second review.

KonradZych commented 6 years ago

Thank you for the review (and it seems it did cost you quite a bit of time!). We are on it and will report back soon with the changes.

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

dc04367 Bioconductor review 1

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

b1d089c bugfix in R/

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

6680f13 corrected R dependency

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

nturaga commented 6 years ago

Hi @KonradZych

Please let me know before your next review, adding a response about what you corrected in the code.

KonradZych commented 6 years ago

Hi @nturaga,

please see below:

DESCRIPTION

R

KonradZych commented 6 years ago

Tagging in @jakob-wirbel

nturaga commented 6 years ago

Good job @KonradZych. The package looks in good shape now.

Just a last comment, your vignette is an executable for some reason. This should not be the case, you can just chmod -x <vignette> on it once and push to your github.

bioc-issue-bot commented 6 years ago

Your package has been accepted. It will be added to the Bioconductor Git repository and nightly builds. Additional information will be posed to this issue in the next several days.

Thank you for contributing to Bioconductor!

mtmorgan commented 6 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/KonradZych.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using biocLite(\"SIAMCAT\"). The package 'landing page' will be created at

https://bioconductor.org/packages/SIAMCAT

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.