To make a version 1.0 of a reference library database R file, we will create a shared GitHub repo on the Maine eDNA github for an RMarkdown (and associated files) to be the working home for development and comments. This workflow is focused on using 12S sequences collected from GenBank, which will be filtered to de-duplicate records and remove all sequences that do not fit into a specific basepair length (as of yet undetermined). The species search criteria will be based on collected taxonomic lists for the state of Maine, known invasive species, and one exemplar per taxonomic family. Other functions to be added are a neighbor-joining phylogeny function and a returned list of gap sequences.The outputs for this workflow will be a taxonomy dataframe and a linked FASTA file with the sequences, and an image file of neighbor-joining phylogeny.
Functionality will be added as it is developed to:
use GBIF-polygons as well to enable more customized selections