gjospin / PhyloSift

Phylogenetic and taxonomic analysis for genomes and metagenomes
82 stars 18 forks source link

"recursive" processing of reads in well-sampled parts of the tree #49

Closed koadman closed 12 years ago

koadman commented 12 years ago

Phylogenetic placement of reads that are very close to a reference sequence would be more accurately placed using their DNA sequence. We should identify these reads/sequences on the basis of rapsearch/blast output and flag them for DNA analysis instead of protein analysis.

Related to this, we will need the updating script to divide up the database into subsets of taxa that are similar enough for nucleotide analysis. Leaving them lumped together has been tried and results in poor quality inference -- the phylogenetic models get confused by the extensive diversity at any particular site. An alternative would be codon analysis, but no known read placement tool supports this.

koadman commented 12 years ago

This appears to be working, although there are parameters that will need to be tuned so that it works better. That can be a separate issue.