Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

MetaScope #2844

Closed aubreyodom closed 1 year ago

aubreyodom commented 1 year ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 1 year ago

Hi @aubreyodom

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Type: Package
Package: MetaScope
Title: Tools and functions for preprocessing 16S and metagenomic
    sequencing microbiome data
Version: 0.99.0
Authors@R: c(
    person("Aubrey", "Odom-Mabey", , "aodom@bu.edu", role = c("aut", "cre"),
 comment = c(ORCID = "0000-0001-7113-7598")),
    person("Rahul", "Varki", , "rvarki@bu.edu", role = "aut"),
    person("W. Evan", "Johnson", , "wej@bu.edu", role = "aut",
 comment = c(ORCID = "0000-0002-6247-6595")),
    person("Howard", "Fan", , "hjfan@bu.edu", role = "ctb")
  )
Description: This package contains tools and methods for preprocessing
    microbiome data. Functionality includes library generation,
    demultiplexing, alignment, and microbe identification.  It is partly
    an R translation of the PathoScope 2.0 pipeline.
License: Artistic-2.0
URL: https://github.com/compbiomed/metascope
    https://compbiomed.github.io/metascope-docs/
BugReports: https://github.com/compbiomed/MetaScope/issues
Depends:
    R (>= 4.2.0)
Imports:
    Biostrings,
    data.table,
    dplyr,
    ggplot2,
    magrittr,
    Matrix,
    MultiAssayExperiment,
    qlcMatrix,
    Rbowtie2,
    readr,
    rlang,
    Rsamtools,
    S4Vectors,
    stringr,
    SummarizedExperiment,
    taxize,
    tidyr,
    tools
Suggests:
    BiocStyle,
    biomformat,
    knitr,
    lintr,
    rmarkdown,
    Rsubread,
    spelling,
    sys,
    testthat,
    usethis
Enhances:
    BiocParallel
VignetteBuilder: 
    knitr
BiocType: Software
biocViews: MicrobiomeData, ReproducibleResearch, SequencingData
Encoding: UTF-8
Language: en-US
LazyData: FALSE
RoxygenNote: 7.2.1
vjcitn commented 1 year ago

I just noticed this while checking package:

DONE! Downloaded 3 genomes to /tmp/RtmpFlEjva/file3d43012a33ae3//tmp/RtmpFlEjva/file3d43012a33ae3/Staphylococcus_aureus_subsp._aureus_ST398.fasta.gz
Finding Staphylococcus aureus subsp. aureus Mu3
Staphylococcus aureus subsp. aureus Mu3 is a strain under the Bacteria Superkingdom
Loading the refseq table for bacteria
Creating table of relevant taxa
Downloading 2 Staphylococcus aureus subsp. aureus Mu3 genome(s) from NCBI
trying URL 'https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/010/445/GCF_000010445.1_ASM1044v1/GCF_000010445.1_ASM1044v1_genomic.fna.gz'
Content type 'application/x-gzip' length 837650 bytes (818 KB)
==================================================
downloaded 818 KB

trying URL 'https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/229/265/GCF_001229265.1_5295_1_1/GCF_001229265.1_5295_1_1_genomic.fna.gz'
Content type 'application/x-gzip' length 847024 bytes (827 KB)
==================================================
downloaded 827 KB

DONE! Downloaded 2 genomes to /tmp/RtmpFlEjva/file3d43012a33ae3//tmp/RtmpFlEjva/file3d43012a33ae3/Staphylococcus_aureus_subsp._aureus_Mu3.fasta.gz
Finding Staphylococcus epidermidis RP62A
Staphylococcus epidermidis RP62A is a strain under the Bacteria Superkingdom
Loading the refseq table for bacteria

are these resources being cached for the user? It doesn't seem so, BiocFileCache is not employed. I'd ask @lshep and @lwaldron to have a look at this package to avoid duplication with curatedMetagenomicData and to ensure caching and AnnotationHub are being used effectively. Thanks for your submission!

aubreyodom commented 1 year ago

Hi @vjcitn , thanks for taking a look. I did not implement caching. I'll read more about BiocFileCache and see if I can implement it into my package. If you have any specific tips on usage as it relates to the temporary folders that I use for the examples in this package, I would appreciate hearing them.

aubreyodom commented 1 year ago

I should also note -- there's no temporary files being created when the user is actually running the functions; they are directly downloaded to the user's directory of choice, as specified by the out_dir parameter in the case of the download_refseq() function. I implemented temporary folders for all examples and the vignette as the nature of the package does involve constant writing of intermediate files within the pipeline and for various functions. I'm not sure if that changes anything, but I wanted to make that more clear.

There is no overlap with curatedMetagenomicData as we are pulling from the NCBI nucleotide and/or RefSeq libraries to obtain reference genomes for alignment against metagenomic data.

bioc-issue-bot commented 1 year ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/MetaScope to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 6119e4189ad45168abf3433e3fa0a3ee265e1dba

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/MetaScope to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

vjcitn commented 1 year ago

Even though the package passes the linux builder, the checking of the package redownloads material each run because BiocFileCache is not used. If the genomes obtained are of general interest, consider an AnnotationHubData contribution that helps define provenance and increases convenience of access to these genomes. @lshep can give pointers; I was surprised that AnnotationHub only has records for Staph argentus at this time

lshep commented 1 year ago

@aubreyodom Did you need any advice on implementing caching mechanism in vince's comment above? May we expect updates to the package soon?

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: dac62cc9ee0e3cd85562e373457aa4e0cd58ab13

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/MetaScope to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

aubreyodom commented 1 year ago

@lshep @vjcitn I successfully implemented the caching. However, I'm not sure why there's an error when building the vignette on mac, but everything passes on linux. All that is said is "cannot read from connection." Here's the output when running on merida1:

===============================

 R CMD BUILD

===============================

* checking for file MetaScope/DESCRIPTION ... OK
* preparing MetaScope:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
--- re-building MetaScope_vignette.Rmd using rmarkdown
Quitting from lines 102-110 (MetaScope_vignette.Rmd) 
Error: processing vignette 'MetaScope_vignette.Rmd' failed with diagnostics:
cannot read from connection
--- failed re-building MetaScope_vignette.Rmd

SUMMARY: processing the following file failed:
  MetaScope_vignette.Rmd

Error: Vignette re-building failed.
Execution halted

Any advice would be very helpful.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: f8a394a027669159742fd093a769f5314f6cfe2d

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/MetaScope to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: da6d4326d75f254b3e6eeebb21c2d09a9b9dec02

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/MetaScope to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: c25009389ce43ba7ebd94b14f42ebb083945a67e

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/MetaScope to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 7be6c697bb88cf45c6797cedb37728326081e3aa

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/MetaScope to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

LiNk-NY commented 1 year ago

Hi Aubrey, @aubreyodom Thank you for your submission. Please see the review below.

Best regards, Marcel


MetaScope #2844

DESCRIPTION

NAMESPACE

vignettes

R

tests

/

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: b31e60530f71ed1501c5b1d98c70484d0ddb37e5

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/MetaScope to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

aubreyodom commented 1 year ago

Hi Marcel @LiNk-NY , thank you for your review of MetaScope. Please see below for my line-by-line response. Please let me know if further changes are needed.

MetaScope #2844

DESCRIPTION

  • Looks good.
  • How does this package enhance BiocParallel?

BiocParallel enhances MetaScope's functions by allowing parallelization. Following the Bioconductor package guidelines for the DESCRIPTION file, "Enhances: is for packages such as Rmpi or parallel that enhance the performance of your package, but are not strictly needed for its functionality." (See section 2.8 on this link).

BiocParallel enhances the functions demultiplex, align_target, align_target_bowtie, mk_bowtie_index, filter_host, filter_host_bowtie, and several helper functions.

NAMESPACE

  • Looks good. Perhaps extractReads can be extract_reads? To agree with the rest of the exported functions in the package.

Thanks for the suggestion; this has been fixed.

  • Consider using the native pipe operator (|>) and avoid the magrittr dependency.

The magrittr package is also used for other purposes in this package (e.g., magrittr::set_names("MGX")) and therefore, changing the pipes would not remove the dependency. As such, I would like to include them as is. There are also a few features that I like about the %>% operator that are not included with the |> operator. I do use |> in the vignette to avoid the magrittr dependency there.

vignettes

  • (optional) Consider wrapping text to the 80 column width limit.

All vignette code chunks should now adhere to the 80 column width limit.

  • Consider using tempfile() as the default temporary directory argument in the download_refseq, demultiplex functions and working within that directory. This will avoid creating the temporary directory within the vignette.

Thanks for the suggestion. I have changed the default save directory locations for both functions to tempfile(). This does avoid the creation of the temporary directory for demultiplex() in the vignette.

However, since the folder of the saved refseq files needs to be accessed later on, I retained the creation of all other temporary directories in the vignette. This also allows for a clearer delineation of what each directory contains by assigning names beforehand.

R

Thank you for pointing me in the right direction; I was unsure of how to do this. I have fixed this issue in all files where I called globalVariables().

  • Are samtools calls portable to other systems (in merge_bam_files)?

Assuming that the samtools software is visible and loaded ( using MetaScope:::check_samtools_exists()), these calls should indeed be portable. This has been tested on Linux, Mac, and Windows.

  • (optional) Consider separating the %<>% operations to make the code more readable and remove the magrittr dependency.

I have separated the operations accordingly. All instances were in convert_animalcules.R.

  • Avoid saving serialized instances of classes, e.g. saveRDS in create_MAE which can become invalid whenever the class gets updated. Instead only return the MultiAssayExperiment object from the function.

I have removed the saveRDS step for create_MAE and updated the documentation to reflect this.

  • Consider an option for adjusting the verbosity of functions like demultiplex and others.

Thank you for this suggestion. I have implemented a default quiet mode in most functions.

  • Looks good.

Thanks.

tests

  • We highly recommend adding unit tests to the package (covr results 0%).

I agree that this would be ideal, and I do have experience implementing package unit tests. However, for this package, many of the outputs are writing files and the functions have lengthy processing times. Therefore, we decided to not include tests so as to not burden the check time on Bioconductor servers.

other

  • How is index.md different from the README.md?

This file is included for compilation of the website built using the pkgdown package. It allows for some slight differences between the github README and the website introduction.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 2bd5ab7765d4290409625ecfd2028700fe443591

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/MetaScope to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

LiNk-NY commented 1 year ago

Hi Aubrey, @aubreyodom Thanks for making those changes. The package looks good to me. Consider resolving some of the NOTES from BiocCheck. Best regards, Marcel

bioc-issue-bot commented 1 year ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.

lshep commented 1 year ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/aubreyodom.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("MetaScope"). The package 'landing page' will be created at

https://bioconductor.org/packages/MetaScope

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.