Closed aubreyodom closed 1 year ago
Hi @aubreyodom
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Type: Package
Package: MetaScope
Title: Tools and functions for preprocessing 16S and metagenomic
sequencing microbiome data
Version: 0.99.0
Authors@R: c(
person("Aubrey", "Odom-Mabey", , "aodom@bu.edu", role = c("aut", "cre"),
comment = c(ORCID = "0000-0001-7113-7598")),
person("Rahul", "Varki", , "rvarki@bu.edu", role = "aut"),
person("W. Evan", "Johnson", , "wej@bu.edu", role = "aut",
comment = c(ORCID = "0000-0002-6247-6595")),
person("Howard", "Fan", , "hjfan@bu.edu", role = "ctb")
)
Description: This package contains tools and methods for preprocessing
microbiome data. Functionality includes library generation,
demultiplexing, alignment, and microbe identification. It is partly
an R translation of the PathoScope 2.0 pipeline.
License: Artistic-2.0
URL: https://github.com/compbiomed/metascope
https://compbiomed.github.io/metascope-docs/
BugReports: https://github.com/compbiomed/MetaScope/issues
Depends:
R (>= 4.2.0)
Imports:
Biostrings,
data.table,
dplyr,
ggplot2,
magrittr,
Matrix,
MultiAssayExperiment,
qlcMatrix,
Rbowtie2,
readr,
rlang,
Rsamtools,
S4Vectors,
stringr,
SummarizedExperiment,
taxize,
tidyr,
tools
Suggests:
BiocStyle,
biomformat,
knitr,
lintr,
rmarkdown,
Rsubread,
spelling,
sys,
testthat,
usethis
Enhances:
BiocParallel
VignetteBuilder:
knitr
BiocType: Software
biocViews: MicrobiomeData, ReproducibleResearch, SequencingData
Encoding: UTF-8
Language: en-US
LazyData: FALSE
RoxygenNote: 7.2.1
I just noticed this while checking package:
DONE! Downloaded 3 genomes to /tmp/RtmpFlEjva/file3d43012a33ae3//tmp/RtmpFlEjva/file3d43012a33ae3/Staphylococcus_aureus_subsp._aureus_ST398.fasta.gz
Finding Staphylococcus aureus subsp. aureus Mu3
Staphylococcus aureus subsp. aureus Mu3 is a strain under the Bacteria Superkingdom
Loading the refseq table for bacteria
Creating table of relevant taxa
Downloading 2 Staphylococcus aureus subsp. aureus Mu3 genome(s) from NCBI
trying URL 'https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/010/445/GCF_000010445.1_ASM1044v1/GCF_000010445.1_ASM1044v1_genomic.fna.gz'
Content type 'application/x-gzip' length 837650 bytes (818 KB)
==================================================
downloaded 818 KB
trying URL 'https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/229/265/GCF_001229265.1_5295_1_1/GCF_001229265.1_5295_1_1_genomic.fna.gz'
Content type 'application/x-gzip' length 847024 bytes (827 KB)
==================================================
downloaded 827 KB
DONE! Downloaded 2 genomes to /tmp/RtmpFlEjva/file3d43012a33ae3//tmp/RtmpFlEjva/file3d43012a33ae3/Staphylococcus_aureus_subsp._aureus_Mu3.fasta.gz
Finding Staphylococcus epidermidis RP62A
Staphylococcus epidermidis RP62A is a strain under the Bacteria Superkingdom
Loading the refseq table for bacteria
are these resources being cached for the user? It doesn't seem so, BiocFileCache is not employed. I'd ask @lshep and @lwaldron to have a look at this package to avoid duplication with curatedMetagenomicData and to ensure caching and AnnotationHub are being used effectively. Thanks for your submission!
Hi @vjcitn , thanks for taking a look. I did not implement caching. I'll read more about BiocFileCache and see if I can implement it into my package. If you have any specific tips on usage as it relates to the temporary folders that I use for the examples in this package, I would appreciate hearing them.
I should also note -- there's no temporary files being created when the user is actually running the functions; they are directly downloaded to the user's directory of choice, as specified by the out_dir
parameter in the case of the download_refseq()
function. I implemented temporary folders for all examples and the vignette as the nature of the package does involve constant writing of intermediate files within the pipeline and for various functions. I'm not sure if that changes anything, but I wanted to make that more clear.
There is no overlap with curatedMetagenomicData as we are pulling from the NCBI nucleotide and/or RefSeq libraries to obtain reference genomes for alignment against metagenomic data.
A reviewer has been assigned to your package. Learn what to expect during the review process.
IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.
Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/MetaScope
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 6119e4189ad45168abf3433e3fa0a3ee265e1dba
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/MetaScope
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Even though the package passes the linux builder, the checking of the package redownloads material each run because BiocFileCache is not used. If the genomes obtained are of general interest, consider an AnnotationHubData contribution that helps define provenance and increases convenience of access to these genomes. @lshep can give pointers; I was surprised that AnnotationHub only has records for Staph argentus at this time
@aubreyodom Did you need any advice on implementing caching mechanism in vince's comment above? May we expect updates to the package soon?
Received a valid push on git.bioconductor.org; starting a build for commit id: dac62cc9ee0e3cd85562e373457aa4e0cd58ab13
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/MetaScope
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
@lshep @vjcitn I successfully implemented the caching. However, I'm not sure why there's an error when building the vignette on mac, but everything passes on linux. All that is said is "cannot read from connection." Here's the output when running on merida1:
===============================
R CMD BUILD
===============================
* checking for file MetaScope/DESCRIPTION ... OK
* preparing MetaScope:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
--- re-building MetaScope_vignette.Rmd using rmarkdown
Quitting from lines 102-110 (MetaScope_vignette.Rmd)
Error: processing vignette 'MetaScope_vignette.Rmd' failed with diagnostics:
cannot read from connection
--- failed re-building MetaScope_vignette.Rmd
SUMMARY: processing the following file failed:
MetaScope_vignette.Rmd
Error: Vignette re-building failed.
Execution halted
Any advice would be very helpful.
Received a valid push on git.bioconductor.org; starting a build for commit id: f8a394a027669159742fd093a769f5314f6cfe2d
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/MetaScope
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: da6d4326d75f254b3e6eeebb21c2d09a9b9dec02
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/MetaScope
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: c25009389ce43ba7ebd94b14f42ebb083945a67e
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/MetaScope
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 7be6c697bb88cf45c6797cedb37728326081e3aa
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/MetaScope
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi Aubrey, @aubreyodom Thank you for your submission. Please see the review below.
Best regards, Marcel
BiocParallel
?extractReads
can be extract_reads
? To agree with the
rest of the exported functions in the package.|>
) and avoid the magrittr
dependency.tempfile()
as the default temporary directory argument in
the download_refseq
, demultiplex
functions and working within that
directory. This will avoid creating the temporary directory within the
vignette.align_details
within the align_target
function. Please refer to this section for loading data used by a function
within a package:
http://contributions.bioconductor.org/data.html?q=data#exported-data-and-the-data-directory
and remove the globalVariables
function call.samtools
calls portable to other systems (in merge_bam_files
)?%<>%
operations to make the code more
readable and remove the magrittr
dependency.saveRDS
in create_MAE
which can become invalid whenever the class gets updated. Instead only return
the MultiAssayExperiment
object from the function.demultiplex
and others.index.md
different from the README.md
?Received a valid push on git.bioconductor.org; starting a build for commit id: b31e60530f71ed1501c5b1d98c70484d0ddb37e5
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/MetaScope
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi Marcel @LiNk-NY , thank you for your review of MetaScope. Please see below for my line-by-line response. Please let me know if further changes are needed.
MetaScope #2844
DESCRIPTION
- Looks good.
- How does this package enhance
BiocParallel
?
BiocParallel
enhances MetaScope's functions by allowing parallelization. Following the Bioconductor package guidelines for the DESCRIPTION file, "Enhances: is for packages such as Rmpi
or parallel
that enhance the performance of your package, but are not strictly needed for its functionality." (See section 2.8 on this link).
BiocParallel enhances the functions demultiplex
, align_target
, align_target_bowtie
, mk_bowtie_index
, filter_host
, filter_host_bowtie
, and several helper functions.
NAMESPACE
- Looks good. Perhaps
extractReads
can beextract_reads
? To agree with the rest of the exported functions in the package.
Thanks for the suggestion; this has been fixed.
- Consider using the native pipe operator (
|>
) and avoid themagrittr
dependency.
The magrittr
package is also used for other purposes in this package (e.g., magrittr::set_names("MGX")
) and therefore, changing the pipes would not remove the dependency. As such, I would like to include them as is. There are also a few features that I like about the %>%
operator that are not included with the |>
operator. I do use |>
in the vignette to avoid the magrittr
dependency there.
vignettes
- (optional) Consider wrapping text to the 80 column width limit.
All vignette code chunks should now adhere to the 80 column width limit.
- Consider using
tempfile()
as the default temporary directory argument in thedownload_refseq
,demultiplex
functions and working within that directory. This will avoid creating the temporary directory within the vignette.
Thanks for the suggestion. I have changed the default save directory locations for both functions to tempfile()
. This does avoid the creation of the temporary directory for demultiplex()
in the vignette.
However, since the folder of the saved refseq files needs to be accessed later on, I retained the creation of all other temporary directories in the vignette. This also allows for a clearer delineation of what each directory contains by assigning names beforehand.
R
- It looks like you should load
align_details
within thealign_target
function. Please refer to this section for loading data used by a function within a package: http://contributions.bioconductor.org/data.html?q=data#exported-data-and-the-data-directory and remove theglobalVariables
function call.
Thank you for pointing me in the right direction; I was unsure of how to do this. I have fixed this issue in all files where I called globalVariables()
.
- Are
samtools
calls portable to other systems (inmerge_bam_files
)?
Assuming that the samtools
software is visible and loaded ( using MetaScope:::check_samtools_exists()
), these calls should indeed be portable. This has been tested on Linux, Mac, and Windows.
- (optional) Consider separating the
%<>%
operations to make the code more readable and remove themagrittr
dependency.
I have separated the operations accordingly. All instances were in convert_animalcules.R
.
- Avoid saving serialized instances of classes, e.g.
saveRDS
increate_MAE
which can become invalid whenever the class gets updated. Instead only return theMultiAssayExperiment
object from the function.
I have removed the saveRDS
step for create_MAE
and updated the documentation to reflect this.
- Consider an option for adjusting the verbosity of functions like
demultiplex
and others.
Thank you for this suggestion. I have implemented a default quiet mode in most functions.
- Looks good.
Thanks.
tests
- We highly recommend adding unit tests to the package (covr results 0%).
I agree that this would be ideal, and I do have experience implementing package unit tests. However, for this package, many of the outputs are writing files and the functions have lengthy processing times. Therefore, we decided to not include tests so as to not burden the check time on Bioconductor servers.
other
- How is
index.md
different from theREADME.md
?
This file is included for compilation of the website built using the pkgdown
package. It allows for some slight differences between the github README and the website introduction.
Received a valid push on git.bioconductor.org; starting a build for commit id: 2bd5ab7765d4290409625ecfd2028700fe443591
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/MetaScope
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi Aubrey, @aubreyodom
Thanks for making those changes.
The package looks good to me. Consider resolving some of the NOTES
from BiocCheck
.
Best regards,
Marcel
Your package has been accepted. It will be added to the Bioconductor nightly builds.
Thank you for contributing to Bioconductor!
Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.
The master branch of your GitHub repository has been added to Bioconductor's git repository.
To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/aubreyodom.keys is not empty), then no further steps are required. Otherwise, do the following:
See further instructions at
https://bioconductor.org/developers/how-to/git/
for working with this repository. See especially
https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/
to keep your GitHub and Bioconductor repositories in sync.
Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at
https://bioconductor.org/checkResults/
(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("MetaScope")
. The package 'landing page' will be created at
https://bioconductor.org/packages/MetaScope
If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[X] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[X] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[X] I understand Bioconductor Package Naming Policy and acknowledge Bioconductor may retain use of package name.
[X] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[X] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[X] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
[X] I am familiar with the Bioconductor code of conduct and agree to abide by it.
I am familiar with the essential aspects of Bioconductor software management, including:
For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.