gjospin / PhyloSift

Phylogenetic and taxonomic analysis for genomes and metagenomes
82 stars 18 forks source link

Compare SSU-Align 18s to ARB alignments #54

Closed koadman closed 12 years ago

koadman commented 12 years ago

Use a well characterized dataset to understand whether the automatically generated SSU-Align alignments are sufficient for phylogeny.

koadman commented 12 years ago

What are the metrics for comparison? If (or when) we observe tree differences between the two, how do we decide which is better?

hollybik commented 12 years ago

Hands down, SSU-align is much better in biological terms - it incorporates eukaryote-specific secondary structure and uses HMMs. (ARB aligns eukaryotes based on nearest-neighbour similarity and E.coli secondary structure.)

I don't expect any major differences in the phylogenetic topology - I am going to use my published nematode trees based on ARB to test this. BUT, if there are differences then we need to flag this in a publication - ARB is clearly violating a number biological truths for eukaryotes, yet the eukaryote community continues to rely on their alignment and ribosomal tools. If there is a more biologically accurate tool out there (SSU-align) we need to spread the word.

Exact metrics TBD - depends on what the trees look like when I play around...

hollybik commented 12 years ago

Finally had a chance to finish this off and look through the SSU-align tree I constructed using my thesis dataset (was comparing the SSU-align alignment with my ARB alignment previously published here: http://www.biomedcentral.com/1471-2148/10/353/abstract ) . The SSU-align tree looks pretty awesome, the taxon placements are consistent with my published trees, and the clade structure appears to be similar at lower and higher levels. One or long branches but nothing too worrying. I didn't want to waste too much time annotating and deeply comparing trees (since this was more of a side question and a quality check), but the bootstrapped RAxML tree is generated and partially annotated if we want to harness it for something useful in the future.