davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
679 stars 186 forks source link

Recon_Gene_Trees output #205

Closed o-william-white closed 5 years ago

o-william-white commented 5 years ago

Hello,

Not an issue, just wanted to clarify some details concerning the Recon_Gene_Trees output.

Comparing the trees in the Gene_Trees output with the Recon_Gene_Trees output, am I correct in assuming that the main differences are that the Recon_Gene_Trees are rooted (based on the species tree) and the nodes are annotated?

In terms of the analysis, have the Gene_Trees been used for determining Orthogroups and the Recon_Gene_Trees for identifying Orthologues?

I also noticed that Gene_Trees with only three samples were not included in the Recon_Gene_Trees file. Is there no need to root or annotate trees with only three taxa for orthogroup/orthologue inference?

Hope this makes sense

Best wishes and thanks for providing OrthoFinder

Ollie

davidemms commented 5 years ago

Hi Ollie

The orthogroups are calculated first from BLAST/DIAMOND results, these determine which genes are in each gene tree (there is one gene tree for each orthogroup).

The orthologues are inferred the Recon_Gene_Trees. These trees come from rooting the gene trees and then making small scale rearrangements under an adapted version of DLCpar's Duplication-Loss-Coalescent model. These rearrangements are to ensure that orthologue inference isn't impeded by minor gene tree inference errors and gives significantly higher orthologue inference accuracy. And yes you're right, there's no need to examine the tree taxa trees for orthologue inference - but I can see that it could be useful to have the rooted versions of these. Is that the case for you? All the best David

o-william-white commented 5 years ago

Hi David,

Thanks for the explanation, much appreciated. Yes it would be quite helpful to have rooted three taxon trees if possible.

Best wishes Ollie