dib-lab / 2020-workflows-paper

Strategies for leveraging workflow systems to streamline large-scale biological analyses
https://dib-lab.github.io/2020-workflows-paper
Other
6 stars 8 forks source link

Change citation format from latex to manubot #4

Closed taylorreiter closed 4 years ago

taylorreiter commented 4 years ago

I used cross ref to generate DOIs for our previous list of citations:

(DBR) in sequencing adapters. The Biological Bulletin. 2014;227(2):146-160.https://doi.org/10.1086/BBLv227n2p146
 
Afgan E, Baker D, Batut B, Van Den Beek M, Bouvier D, Cech M, et al. The
 
Allaire  J, Cheng J, Xie Y, McPherson J, Chang W, Allen J, et al. rmarkdown:  Dynamic Documents for R. R package version. 2018;1(11).
 
Amstutz P, Crusoe MR, Tijani ́c N, Chapman B, Chilton J, Heuer M, et al. Common workflow language, v1. 0. 2016;.
 
Analytics C. Conda: A Cross-Platform, Python-Agnostic Binary Package Manager;.
 
Andrews S, et al.. FastQC: a quality control tool for high throughput sequence data; 2010.
 
Aruliah  DA, Brown CT, Hong NPC, Davis M, Guy RT, Haddock SH, et al. Best  practices for scientific computing. arXiv preprint arXiv:12100530.  2012;.
 
Bacher  R, Kendziorski C. Design and computational analysis of single-cell  RNA-sequencing experiments. Genome biology. 2016;17(1):63.https://doi.org/10.1186/s13059-016-0927-y
 
Bailleul  D, Stoeckel S, Arnaud-Haond S. RClone: a package to identify MultiLocus  Clonal Lineages and handle clonal data sets in r. Methods in ecology  and evolution. 2016;7(8):966-970.https://doi.org/10.1111/2041-210X.12550
 
Blischak  JD, Davenport ER, Wilson G. A quick introduction to version control  with Git and GitHub. PLoS computational biology. 2016;12(1).https://doi.org/10.1371/journal.pcbi.1004668
 
Boothby  TC, Tenlen JR, Smith FW, Wang JR, Patanella KA, Nishimura EO, et al.  Evidence for extensive horizontal gene transfer from the draft genome of  a tardigrade. Proceedings of the National Academy of Sciences.  2015;112(52):15976-15981.https://doi.org/10.1073/pnas.1510461112
 
Brinckman  A, Chard K, Gaffney N, Hategan M, Jones MB, Kowalik K, et al. Computing  environments for reproducibility: Capturing the "Whole Tale". Future  Generation Computer Systems. 2019;94:854-867.https://doi.org/10.1016/j.future.2017.12.029
 
Brown  CT, Moritz D, O'brien M, Reidl F, Reiter T, Sullivan B. Exploring  neighborhoods in large metagenome assembly graphs reveals hidden  sequence diversity. BioRxiv. 2019; p. 462788.https://doi.org/10.1101/462788
 
Ching  T, Huang S, Garmire LX. Power analysis and sample size estimation for  RNA-Seq differential expression. Rna. 2014;20(11):1684-1696.https://doi.org/10.1261/rna.046011.114
 
Clarke  EL, Taylor LJ, Zhao C, Connell A, Lee JJ, Fett B, et al. Sunbeam: an  extensible pipeline for analyzing metagenomic sequencing experiments.  Microbiome. 2019;7(1):46.https://doi.org/10.1186/s40168-019-0658-x
 
Cochrane  G, Karsch-Mizrachi I, Takagi T, Sequence Database Collaboration IN. The  international nucleotide sequence database collaboration. Nucleic acids  research. 2016;44(D1):D48-D50.https://doi.org/10.1093/nar/gkv1323
 
Cokelaer  T, Desvillechabrol D, Legendre R, Cardon M. 'Sequana': a Set of  Snakemake NGS pipelines. Journal of Open Source Software.  2017;2(16):352.https://doi.org/10.21105/joss.00352
 
Conesa  A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, et  al. A survey of best practices for RNA-seq data analysis. Genome  biology. 2016;17(1):13.https://doi.org/10.1186/s13059-016-0881-8
 
da  Fonseca RR, Albrechtsen A, Themudo GE, Ramos-Madrigal J, Sibbesen JA,  Maretty L, et al. Next-generation biology: sequencing and data analysis  approaches for non-model organisms. Marine Genomics. 2016;30:3-13.https://doi.org/10.1016/j.margen.2016.04.012
 
Di  Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C.  Nextflow enables reproducible computational workflows. Nature  biotechnology. 2017;35(4):316-319.https://doi.org/10.1038/nbt.3820
 
Dobin  A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR:  ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15-21.https://doi.org/10.1093/bioinformatics/bts635
 
Edgar  R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression  and hybridization array data repository. Nucleic acids research.  2002;30(1):207-210.https://doi.org/10.1093/nar/30.1.207
 
Eisenhofer  R, Minich JJ, Marotz C, Cooper A, Knight R, Weyrich LS. Contamination  in low microbial biomass microbiome studies: issues and recommendations.  Trends in microbiology. 2019;27(2):105-117.https://doi.org/10.1016/j.tim.2018.11.003
 
Ewels  P, Magnusson M, Lundin S, K ̈aller M. MultiQC: summarize analysis  results for multiple tools and samples in a single report.  Bioinformatics. 2016;32(19):3047-3048.https://doi.org/10.1093/bioinformatics/btw354
 
Ewels  PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, et al. The  nf-core framework for community-curated bioinformatics pipelines. Nature  Biotechnology. 2020;38(3):276-278.https://doi.org/10.1038/s41587-020-0439-x
 
Foster ED, Deardorff A. Open science framework (OSF). Journal of the Medical Library Association: JMLA. 2017;105(2):203.https://doi.org/10.5195/JMLA.2017.88
 
Fu  Y, Wu PH, Beane T, Zamore PD, Weng Z. Elimination of PCR duplicates in  RNA-seq and small RNA-seq using unique molecular identifiers. Bmc  Genomics. 2018;19(1):531.https://doi.org/10.1186/s12864-018-4933-1
 
Fuentes-Pardo  AP, Ruzzante DE. Whole-genome sequencing approaches for conservation  biology: Advantages, limitations and practical recommendations.  Molecular ecology. 2017;26(20):5369-5406.https://doi.org/10.1111/mec.14264
 
Galaxy  platform for accessible, reproducible and collaborative biomedical  analyses: 2018 update. Nucleic acids research. 2018;46(W1):W537-W544.https://doi.org/10.1093/nar/gky379
 
Gru  ̈ning B, Dale R, Sjo ̈din A, Chapman BA, Rowe J, Tomkins-Tinch CH, et  al. Bioconda: sustainable and comprehensive software distribution for  the life sciences. Nature methods. 2018;15(7):475-476.https://doi.org/10.1038/s41592-018-0046-7
 
Haque  A, Engel J, Teichmann SA, L ̈onnberg T. A practical guide to  single-cell RNA-sequencing for biomedical research and clinical  applications. Genome medicine. 2017;9(1):75.https://doi.org/10.1186/s13073-017-0467-4
 
Harris  TW, Arnaboldi V, Cain S, Chan J, Chen WJ, Cho J, et al. WormBase: a  modern model organism information resource. Nucleic acids research.  2020;48(D1):D762-D767.
 
Himmelstein  DS, Rubinetti V, Slochower DR, Hu D, Malladi VS, Greene CS, et al. Open  collaborative writing with Manubot. PLoS computational biology.  2019;15(6):e1007128.https://doi.org/10.1371/journal.pcbi.1007128
 
in bioinformatics. 2013;14(2):178-192.https://doi.org/10.1093/bib/bbs017
 
Johnson  LK, Alexander H, Brown CT. Re-assembly, quality evaluation, and  annotation of 678 microbial eukaryotic reference transcriptomes.  GigaScience. 2019;8(4):giy158.https://doi.org/10.1093/gigascience/giy158
 
Jupyter  P, Bussonnier M, Forde J, Freeman J, Granger B, Head T, et al. Binder  2.0 - Reproducible, interactive, sharable environments for science at  scale. In: Proceedings of the 17th Python in Science Conference. SciPy;  2018.Available from: https://doi.org/10.25080/majora-4af1f417-011.https://doi.org/10.25080/Majora-4af1f417-011
 
K ̈oster J, Rahmann S. Snakemake-a scalable bioinformatics workflow engine. Bioinformatics. 2012;28(19):2520-2522.https://doi.org/10.1093/bioinformatics/bts480
 
Kieser  S, Brown J, Zdobnov EM, Trajkovski M, McCue LA. ATLAS: a Snakemake  workflow for assembly, annotation, and genomic binning of metagenome  sequence data. bioRxiv. 2019; p. 737528.https://doi.org/10.1101/737528
 
Kim  D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome  alignment and genotyping with HISAT2 and HISAT-genotype. Nature  biotechnology. 2019;37(8):907-915.https://doi.org/10.1038/s41587-019-0201-4
 
Kluyver  T, Ragan-Kelley B, P ́erez F, Granger BE, Bussonnier M, Frederic J, et  al. Jupyter Notebooks-a publishing format for reproducible computational  workflows. In: ELPUB; 2016. p. 87-90.
 
Knight  R, Jansson J, Field D, Fierer N, Desai N, Fuhrman JA, et al. Unlocking  the potential of metagenomics through replicated experimental design.  Nature biotechnology. 2012;30(6):513.https://doi.org/10.1038/nbt.2235
 
Knight  R, Vrbanac A, Taylor BC, Aksenov A, Callewaert C, Debelius J, et al.  Best practices for analysing microbiomes. Nature Reviews Microbiology.  2018;16(7):410-422.https://doi.org/10.1038/s41579-018-0029-9
 
Koutsovoulos  G, Kumar S, Laetsch DR, Stevens L, Daub J, Conlon C, et al. No evidence  for extensive horizontal gene transfer in the genome of the tardigrade  Hypsibius dujardini. Proceedings of the National Academy of Sciences.  2016;113(18):5053-5058.https://doi.org/10.1073/pnas.1600338113
 
Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PloS one. 2017;12(5).https://doi.org/10.1371/journal.pone.0177459
 
Landau  W. The drake R package: A pipeline toolkit for reproducibility and  high-performance computing. Journal of Open Source Software.  2018;3(21):550.https://doi.org/10.21105/joss.00550
 
Liao  YC, Lin SH, Lin HH. Completing bacterial genome assemblies: strategy  and performance comparisons. Scientific reports. 2015;5:8747.https://doi.org/10.1038/srep08747
 
Luecken MD, Theis FJ. Current best practices in single-cell RNA-seq analysis: a tutorial. Molecular systems biology. 2019;15(6).https://doi.org/10.15252/msb.20188746
 
MacManes MD. On the optimal trimming of high-throughput mRNA sequence data. Frontiers in genetics. 2014;5:13.https://doi.org/10.3389/fgene.2014.00013
 
McLaren MR, Willis AD, Callahan BJ. Consistent and correctable bias in metagenomic sequencing experiments. Elife. 2019;8.https://doi.org/10.7554/eLife.46923
 
Merkel D. Docker: lightweight linux containers for consistent development and deployment. Linux journal. 2014;2014(239):2.
 
Meyer  CA, Liu XS. Identifying and mitigating bias in next-generation  sequencing methods for chromatin biology. Nature Reviews Genetics.  2014;15(11):709-721.https://doi.org/10.1038/nrg3788
 
Mitchell  AL, Almeida A, Beracochea M, Boland M, Burgin J, Cochrane G, et al.  MGnify: the microbiome analysis resource in 2020. Nucleic acids  research. 2020;48(D1):D570-D578.https://doi.org/10.1093/nar/gkz1035
 
Mukherjee  S, Huntemann M, Ivanova N, Kyrpides NC, Pati A. Large-scale  contamination of microbial isolate genomes by Illumina PhiX control.  Standards in genomic sciences. 2015;10(1):18.https://doi.org/10.1186/1944-3277-10-18
 
Murray  DC, Coghlan ML, Bunce M. From benchtop to desktop: important  considerations when designing amplicon sequencing workflows. PLoS One.  2015;10(4).https://doi.org/10.1371/journal.pone.0124671
 
Parekh  S, Ziegenhain C, Vieth B, Enard W, Hellmann I. The impact of  amplification on differential expression analyses by RNA-seq. Scientific  reports. 2016;6:25533.https://doi.org/10.1038/srep25533
 
Parnell  LD, Lindenbaum P, Shameer K, Dall'Olio GM, Swan DC, Jensen LJ, et al.  BioStar: an online question & answer resource for the bioinformatics  community. PLoS computational biology. 2011;7(10).https://doi.org/10.1371/journal.pcbi.1002216
 
Patro  R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast  and bias-aware quantification of transcript expression. Nature methods.  2017;14(4):417.https://doi.org/10.1038/nmeth.4197
 
Pesant  S, Not F, Picheral M, Kandels-Lewis S, Le Bescot N, Gorsky G, et al.  Open science resources for the discovery and analysis of Tara Oceans  data. Scientific data. 2015;2(1):1-16.https://doi.org/10.1038/sdata.2015.23
 
Quince  C, Walker AW, Simpson JT, Loman NJ, Segata N. Shotgun metagenomics,  from sampling to analysis. Nature biotechnology. 2017;35(9):833.https://doi.org/10.1038/nbt.3935
 
Ram  K. Git can facilitate greater reproducibility and increased  transparency in science. Source code for biology and medicine.  2013;8(1):7.https://doi.org/10.1186/1751-0473-8-7
 
Rowe  WP, Carrieri AP, Alcon-Giner C, Caim S, Shaw A, Sim K, et al. Streaming  histogram sketching for rapid microbiome analytics. Microbiome.  2019;7(1):40.https://doi.org/10.1186/s40168-019-0653-2
 
Satyanarayan  A, Moritz D, Wongsuphasawat K, Heer J. Vega-lite: A grammar of  interactive graphics. IEEE transactions on visualization and computer  graphics. 2016;23(1):341-350.https://doi.org/10.1109/TVCG.2016.2599030
 
Schurch  NJ, Schofield P, Gierlin ́ski M, Cole C, Sherstnev A, Singh V, et al.  How many biological replicates are needed in an RNA-seq experiment and  which differential expression tool should you use? Rna. 2016;.https://doi.org/10.1261/rna.058339.116
 
Schweyen  H, Rozenberg A, Leese F. Detection and removal of PCR duplicates in  population genomic ddRAD studies by addition of a degenerate base region
 
Shade  A, Teal TK. Computing Workflows for Biologists: A Roadmap. PLOS  Biology. 2015;13(11):1-10. doi:10.1371/journal.pbio.1002303.https://doi.org/10.1371/journal.pbio.1002303
 
Sievert  C, Parmer C, Hocking T, Chamberlain S, Ram K, Corvellec M, et al.  plotly: Create Interactive Web Graphics via 'plotly. js'. R package  version. 2017;4(1):110.
 
Sinha  R, Abu-Ali G, Vogtmann E, Fodor AA, Ren B, Amir A, et al. Assessment of  variation in microbial community amplicon sequencing by the Microbiome  Quality Control (MBQC) project consortium. Nature biotechnology.  2017;35(11):1077.https://doi.org/10.1038/nbt.3981
 
Smith  EN, Jepsen K, Khosroheidari M, Rassenti LZ, D'Antonio M, Ghia EM, et  al. Biased estimates of clonal evolution and subclonal heterogeneity can  arise from PCR duplicates in deep sequencing experiments. Genome  biology. 2014;15(7):420.https://doi.org/10.1186/s13059-014-0420-4
 
Sra in the cloud;. Available from: https://www.ncbi.nlm.nih.gov/sra/docs/sra-cloud/.
 
Srivastava  A, Sarkar H, Gupta N, Patro R. RapMap: a rapid, sensitive and accurate  tool for mapping RNA-seq reads to transcriptomes. Bioinformatics.  2016;32(12):i192-i200.https://doi.org/10.1093/bioinformatics/btw277
 
Strozzi  F, Janssen R, Wurmus R, Crusoe MR, Githinji G, Di Tommaso P, et al.  Scalable workflows and reproducible data analysis for genomics. In:  Evolutionary Genomics. Springer; 2019. p. 723-745.https://doi.org/10.1007/978-1-4939-9074-0_24
 
Teal  TK, Cranston KA, Lapp H, White E, Wilson G, Ram K, et al. Data  carpentry: workshops to increase data literacy for researchers. 2015;.https://doi.org/10.2218/ijdc.v10i1.351
 
Tenaillon  O, Barrick JE, Ribeck N, Deatherage DE, Blanchard JL, Dasgupta A, et  al. Tempo and mode of genome evolution in a 50,000-generation  experiment. Nature. 2016;536(7615):165-170.https://doi.org/10.1038/nature18959
 
Thorvaldsd  ́ottir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV):  high-performance genomics data visualization and exploration. Briefings
 
Van  Der Valk T, Vezzi F, Ormestad M, Dalen L, Guschanski K. Index hopping  on the Illumina HiseqX platform and its consequences for ancient DNA  studies. Molecular ecology resources. 2019;.https://doi.org/10.1111/1755-0998.13009
 
Volchenboum  SL, Cox SM, Heath A, Resnick A, Cohn SL, Grossman R. Data commons to  support pediatric cancer research. American Society of Clinical Oncology  Educational Book. 2017;37:746-752.https://doi.org/10.14694/EDBK_175029
 
Wilson  G, Aruliah DA, Brown CT, Hong NPC, Davis M, Guy RT, et al. Best  practices for scientific computing. PLoS biology. 2014;12(1).https://doi.org/10.1371/journal.pbio.1001745
 
Wilson  G, Bryan J, Cranston K, Kitzes J, Nederbragt L, Teal TK. Good enough  practices in scientific computing. PLoS computational biology.  2017;13(6).https://doi.org/10.1371/journal.pcbi.1005510