SchlossLab / Schloss_rrnAnalysis_mSphere_2021

Code Club project analyzing utility of ASVs
MIT License
8 stars 4 forks source link

Extract subregions from the 16S rRNA gene sequence data #12

Closed pschloss closed 4 years ago

pschloss commented 4 years ago

We would like to have separate directories in data/ that correspond to different variable regions from the 16S rRNA gene. Write a script code/extract_region.sh that takes in the output filename (e.g. data/v19/rrnDB.align), extracts that region using mothur's pcr.seqs, filters the alignment to remove vertical gaps using mothur's filter.seqs function, and outputs the filtered alignment to the correct directory. Add "garbage collection" to the script by removing any files that are generated that we don't need.

Here are the coordinates for the regions of interest...

pcr.seqs will generate a file ending in *.bad.accnos that includes the names of the sequences that couldn't be digitally amplified. Put that file into the new directory as well and name it rrnDB.bad.accnos. If the file isn't generated by pcr.seqs (because they're all good), then put an empty file with that name into the directory.

Create a rule to generate the files in our Makefile.