Closed KonradZych closed 6 years ago
Hi @KonradZych
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: SIAMCAT
Type: Package
Title: Statistical Inference of Associations between Microbial Communities And host phenoTypes
Version: 0.99.0
Author: Georg Zeller [aut,cre], Jakob Wirbel[aut], Konrad Zych [aut], Nicolai Karcher[ctb] and Kersten Breuer[ctb]
Authors@R: c(person("Georg", "Zeller", role = c("aut", "cre"), email = "zeller@embl.de", comment = c(ORCID = "0000-0003-1429-7485")),
person("Konrad", "Zych", role = c("aut", "cre"), email = "konrad.zych@embl.de", comment = c(ORCID = "0000-0001-7426-0516")),
person("Jakob", "Wirbel", role = c("aut", "cre"), email = "jakob.wirbel@embl.de", comment = c(ORCID = "0000-0002-4073-3562"))
)
Maintainer: Konrad Zych <konrad.zych@embl.de>, Georg Zeller <eller@embl.de>, Jakob Wirbel <jakob.wirbel@embl.de>
Description: Pipeline for Statistical Inference of Associations between Microbial Communities And host phenoTypes (SIAMCAT). A primary goal of analyzing microbiome data is to determine changes in community composition that are associated with environmental factors. In particular, linking human microbiome composition to host phenotypes such as diseases has become an area of intense research. For this, robust statistical modeling and biomarker extraction toolkits are crucially needed. SIAMCAT provides a full pipeline supporting data preprocessing, statistical association testing, statistical modeling (LASSO logistic regression) including tools for evaluation and interpretation of these models (such as cross validation, parameter selection, ROC analysis and diagnostic model plots).
Depends:
R (>= 3.4.0),
mlr,
phyloseq
Imports:
beanplot,
glmnet,
graphics,
grDevices,
grid,
gridBase,
gridExtra,
LiblineaR,
ParamHelpers,
pROC,
PRROC,
RColorBrewer,
stats,
utils
License: GPL-3
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1
biocViews: Metagenomics, Classification, Microbiome, Sequencing, Preprocessing, Clustering, FeatureExtraction
Suggests:
BiocStyle,
optparse,
testthat,
knitr,
rmarkdown
VignetteBuilder: knitr
A reviewer has been assigned to your package Learn What to Expect during the review process.
IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
4f1c9d5 removing errors caused by faulty examples
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
74815a6 compressing data file
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
8b9754e updating R version dependency
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
ok
ok
Just use Authors@R, there is no need of Author and Maintainer.
You can have only 1 maintainer. Check the build report.
* checking DESCRIPTION meta-information ... NOTE
Authors@R field gives more than one person with maintainer role:
Georg Zeller <zeller@embl.de> [aut, cre] (<https://orcid.org/0000-0003-1429-7485>)
Konrad Zych <konrad.zych@embl.de> [aut, cre] (<https://orcid.org/0000-0001-7426-0516>)
Jakob Wirbel <jakob.wirbel@embl.de> [aut, cre] (<https://orcid.org/0000-0002-4073-3562>)
* checking data for non-ASCII characters ... NOTE
Note: found 4 marked UTF-8 strings
* Checking for recommended biocViews...
* NOTE: Consider adding these automatically suggested biocViews:
GeneticVariability, MultipleComparison
vapply
instead of sapply
and use seq_along
and seq_len
instead of 1:length
. The build report points out these mistakes. These coding practices are important.* Checking coding practice...
* NOTE: Avoid sapply(); use vapply() found in files:
check_associations.r (line 340, column 19)
check_associations.r (line 341, column 19)
check_associations.r (line 490, column 3)
check_associations.r (line 526, column 15)
check_associations.r (line 531, column 13)
model_interpretation_plot.R (line 205, column 22)
plm_utils.r (line 96, column 19)
siamcat_class_methods.R (line 46, column 32)
siamcat_class_methods.R (line 51, column 26)
* NOTE: Avoid 1:...; use seq_len() or seq_along() found in files:
check_associations.r (line 87, column 22)
check_associations.r (line 191, column 13)
check_associations.r (line 226, column 13)
check_associations.r (line 262, column 33)
check_associations.r (line 278, column 17)
check_associations.r (line 278, column 44)
check_associations.r (line 279, column 18)
check_associations.r (line 279, column 49)
check_associations.r (line 282, column 21)
check_associations.r (line 282, column 47)
check_associations.r (line 283, column 21)
check_associations.r (line 283, column 47)
check_associations.r (line 284, column 24)
check_associations.r (line 284, column 45)
check_associations.r (line 285, column 25)
check_associations.r (line 285, column 50)
check_associations.r (line 286, column 24)
check_associations.r (line 286, column 45)
check_associations.r (line 287, column 25)
check_associations.r (line 287, column 50)
check_associations.r (line 290, column 24)
check_associations.r (line 290, column 45)
check_associations.r (line 291, column 25)
check_associations.r (line 291, column 50)
check_associations.r (line 294, column 13)
check_associations.r (line 329, column 32)
check_associations.r (line 342, column 13)
check_associations.r (line 398, column 13)
check_associations.r (line 540, column 41)
check_associations.r (line 544, column 13)
check_confounders.r (line 65, column 13)
check_confounders.r (line 84, column 17)
check_confounders.r (line 93, column 17)
create_data_split.r (line 99, column 13)
create_data_split.r (line 111, column 15)
create_data_split.r (line 162, column 15)
create_data_split.r (line 164, column 33)
create_data_split.r (line 174, column 33)
create_data_split.r (line 175, column 17)
create_data_split.r (line 181, column 33)
create_data_split.r (line 190, column 15)
create_data_split.r (line 194, column 15)
evaluate_predictions.r (line 69, column 15)
evaluate_predictions.r (line 88, column 15)
evaluate_predictions.r (line 145, column 15)
evaluate_predictions.r (line 158, column 15)
evaluate_predictions.r (line 159, column 17)
filter.features.r (line 84, column 15)
filter.features.r (line 88, column 38)
make_predictions.r (line 58, column 69)
make_predictions.r (line 61, column 15)
make_predictions.r (line 62, column 17)
make_predictions.r (line 113, column 72)
make_predictions.r (line 115, column 15)
model_evaluation_plot.r (line 38, column 15)
model_evaluation_plot.r (line 75, column 15)
model_interpretation_plot.R (line 146, column 13)
model_interpretation_plot.R (line 270, column 17)
model_interpretation_plot.R (line 273, column 15)
model_interpretation_plot.R (line 305, column 17)
model_interpretation_plot.R (line 323, column 15)
model_interpretation_plot.R (line 340, column 13)
model_interpretation_plot.R (line 346, column 13)
model_interpretation_plot.R (line 381, column 15)
model_interpretation_plot.R (line 388, column 15)
model_interpretation_plot.R (line 502, column 33)
plm_utils.r (line 19, column 55)
plm_utils.r (line 116, column 15)
siamcat_class_methods.R (line 38, column 16)
train_model.r (line 102, column 16)
train_model.r (line 106, column 24)
Format you code better, with formatR, and make the code more readable. Use 4 spaces instead of 2. The current formatting makes it hard to read your code. The 80 char limit needs to be reasonably followed as well. Important
Use message
instead of cat
for printing to stdout.
add_meta_pred.R
if(verbose>1) cat("+ starting add.meta.pred\n")
Do this as needed everywhere.
@
symbol.add_meta_pred.R, example.
siamcat@phyloseq@otu_table <- otu_table(
rbind(siamcat@phyloseq@otu_table, m),taxa_are_rows=TRUE)
rownames(siamcat@phyloseq@otu_table)[nrow(
siamcat@phyloseq@otu_table)] <- paste('META_', toupper(p), sep='')
phyloseq already has accessors, eg: otu_table(phyloseq_object) use these.Do this everywhere.
check_associations.r
x.q = apply(data1, 1, function (x)quantile(x, c(0.05, 0.25, 0.5, 0.75, 0.95),
na.rm=TRUE, names=FALSE))
y.q = apply(data2, 1, function (x)quantile(x, c(0.05, 0.25, 0.5, 0.75, 0.95),
na.rm=TRUE, names=FALSE))
eg: modify your function to take in a quantiles vector too, and you can reuse it accross your entire package.
apply_quantile <- function(data, quantiles_vector) {
apply(data, 1,
function (x) quantile(x, quantiles_vector,
na.rm = TRUE, names = FALSE))
}
Notice how I indented my code, and formatted it to fit within 80 chars.
rowMedians
and
colMedians
from the matrixStats
package
(https://www.rdocumentation.org/packages/matrixStats/versions/0.53.0/topics/rowMedians).x.medians = apply(data1, 1, function (x) median(x))
y.medians = apply(data2, 1, function (x) median(x))
Also, take a look at rowQuantiles
and colQuantiles
from
matrixStats. Avoid apply
functions and loops and opt to vectorize.
Do this across you codebase.
Remove commented out code.
Lots of nested for
loops (evaluate_predictions.R, make_predictions.R)
As an example,
for (c in 1:ncol(predictions)) {
for (r in 1:length(t)) {
tp[r,c] = sum(test.label==label@positive.lab & predictions[,c] > thr[r])
fp[r,c] = sum(test.label==label@negative.lab & predictions[,c] > thr[r])
tn[r,c] = sum(test.label==label@negative.lab & predictions[,c] < thr[r])
fn[r,c] = sum(test.label==label@positive.lab & predictions[,c] < thr[r])
}
}
the code chunk , predictions[, c] > thr[r]
is comparing a column of numeric value to one value. This is a vectorized function. Similarly, test.label==label@positive.lab
can be done outside the iteration as a logical comparison. This nested for loop seems necessary, and can be refactored into vectorized code.
Please correct the changes in this review, before we move to a second review.
Thank you for the review (and it seems it did cost you quite a bit of time!). We are on it and will report back soon with the changes.
Received a valid push; starting a build. Commits are:
dc04367 Bioconductor review 1
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
b1d089c bugfix in R/
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
6680f13 corrected R dependency
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
Hi @KonradZych
Please let me know before your next review, adding a response about what you corrected in the code.
Hi @nturaga,
please see below:
Removed NOTEs corresponding to UTF-8 strings and added recommended biocViews
Changed vapply
to sapply and 1:length to seq_along
and seq_len
All lines are < 80 chars and indents are 4 spaces
We now use message
instead of cat
for printing to stdout.
We got rid of the @
and are using correct accessors and assignment methods
Repeated code was moved to small helper functions
We are using vectorized functions from matrixStats
instead of apply whenever possible
Commented out code was removed
For loop nesting was eliminated
Tagging in @jakob-wirbel
Good job @KonradZych. The package looks in good shape now.
Just a last comment, your vignette is an executable for some reason. This should not be the case, you can just chmod -x <vignette>
on it once and push to your github.
Your package has been accepted. It will be added to the Bioconductor Git repository and nightly builds. Additional information will be posed to this issue in the next several days.
Thank you for contributing to Bioconductor!
The master branch of your GitHub repository has been added to Bioconductor's git repository.
To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/KonradZych.keys is not empty), then no further steps are required. Otherwise, do the following:
See further instructions at
https://bioconductor.org/developers/how-to/git/
for working with this repository. See especially
https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/
to keep your GitHub and Bioconductor repositories in sync.
Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at
https://bioconductor.org/checkResults/
(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using biocLite(\"SIAMCAT\")
. The package 'landing page' will be created at
https://bioconductor.org/packages/SIAMCAT
If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
I am familiar with the essential aspects of Bioconductor software management, including:
For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.