We would like to have separate directories in data/ that correspond to different variable regions from the 16S rRNA gene. Write a script code/extract_region.sh that takes in the output filename (e.g. data/v19/rrnDB.align), extracts that region using mothur's pcr.seqs, filters the alignment to remove vertical gaps using mothur's filter.seqs function, and outputs the filtered alignment to the correct directory. Add "garbage collection" to the script by removing any files that are generated that we don't need.
Here are the coordinates for the regions of interest...
V1-V19 (v19): 1044 to 43116
V4 (v4): 13862 to 23444
V3-V4 (v34): 6428 to 23444
V4-V5 (v45): 13862 to 27659
pcr.seqs will generate a file ending in *.bad.accnos that includes the names of the sequences that couldn't be digitally amplified. Put that file into the new directory as well and name it rrnDB.bad.accnos. If the file isn't generated by pcr.seqs (because they're all good), then put an empty file with that name into the directory.
Create a rule to generate the files in our Makefile.
We would like to have separate directories in
data/
that correspond to different variable regions from the 16S rRNA gene. Write a scriptcode/extract_region.sh
that takes in the output filename (e.g.data/v19/rrnDB.align
), extracts that region using mothur'spcr.seqs
, filters the alignment to remove vertical gaps using mothur'sfilter.seqs
function, and outputs the filtered alignment to the correct directory. Add "garbage collection" to the script by removing any files that are generated that we don't need.Here are the coordinates for the regions of interest...
pcr.seqs
will generate a file ending in*.bad.accnos
that includes the names of the sequences that couldn't be digitally amplified. Put that file into the new directory as well and name itrrnDB.bad.accnos
. If the file isn't generated bypcr.seqs
(because they're all good), then put an empty file with that name into the directory.Create a rule to generate the files in our Makefile.