What are the quality score formats for the 3 HISEQ runs?
Steps
Demultiplex using in-house scripts
Use pick_otus_closed_ref.py, modified to use usearch_local instead of usearch_global and use --min_query_cov 0.95 to compensate for using local alignments.
Re-assign taxonomies using Kraken and a modified NCBI database (added more Bacteroides genomes).
This issue could go in https://github.com/audy/richardson-2014-data but I'm choosing to keep it here.
This recent paper suggests using de novo clustering but that's impossible given the size of the dataset (2 Illumina HiSeq runs).
Trying to match these methods as closely as possible:
Analysis, Optimization and Verification of Illumina-Generated 16S rRNA Gene Amplicon Surveys PLoS One 2014
Prerequisites
What are the quality score formats for the 3 HISEQ runs?
Steps
pick_otus_closed_ref.py
, modified to useusearch_local
instead ofusearch_global
and use--min_query_cov 0.95
to compensate for using local alignments.