Phylogenetic placement of reads that are very close to a reference sequence would be more accurately placed using their DNA sequence. We should identify these reads/sequences on the basis of rapsearch/blast output and flag them for DNA analysis instead of protein analysis.
Related to this, we will need the updating script to divide up the database into subsets of taxa that are similar enough for nucleotide analysis. Leaving them lumped together has been tried and results in poor quality inference -- the phylogenetic models get confused by the extensive diversity at any particular site. An alternative would be codon analysis, but no known read placement tool supports this.
Phylogenetic placement of reads that are very close to a reference sequence would be more accurately placed using their DNA sequence. We should identify these reads/sequences on the basis of rapsearch/blast output and flag them for DNA analysis instead of protein analysis.
Related to this, we will need the updating script to divide up the database into subsets of taxa that are similar enough for nucleotide analysis. Leaving them lumped together has been tried and results in poor quality inference -- the phylogenetic models get confused by the extensive diversity at any particular site. An alternative would be codon analysis, but no known read placement tool supports this.