alexpiper / HemipteraMetabarcodingMS

Reproducable analyses for hemiptera metabarcoding MS
0 stars 0 forks source link

Reproducible Analyses from the Hemiptera Metabarcoding Manuscript

This repository hosts the R based workflow that performed the analyses presented in the manuscript: Batovska, J., Piper, A. M., Valenzuela, I., Cunningham, J. P., & Blacket, M. J. (2021). Developing a Non-destructive Metabarcoding Protocol for Detection of Pest Insects in Bulk Trap Catches. Scientific Reports, 11, 7946. https://doi.org/10.1038/s41598-021-85855-6

The reproducible workflow to conduct the analyses can be found as Rmarkdown documents in the root directory, or rendered here. The input sequencing data are not included in the repository for size reasons, and are instead available from the NCBI Sequence Read Archive or zenodo. However, RDS files holding intermediate data objects such as the OTU and taxonomy tables suitable for performing the analyses are contained inside the data directory of this repository.

The taxonomic assignment step relies on reference fasta files formatted for the RDP classifier implemented in the DADA2 package. These fasta files were created with the database_builder.rmd script found in the root directory, which is also rendered here. References fasta files are not included in the repository for size reasons, and are instead hosted on zenodo

You can run these analyses on your own machine by (1) cloning the repository, (2) obtaining the raw sequencing data in fastq format (this can be done automatically by running the scripts/download_SRA_fastqs.sh bash script), (3) obtaining reference databases from zenodo, (4) installing required R libraries, and (5) pressing Run in the Rmarkdown file. Even without the sequencing data, the analysis and plotting portion of each Rmarkdown document can be run using the stored rds files in the data directory.