davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
679 stars 186 forks source link

retained rooted species tree when multiple possibilities #382

Open Lulu84 opened 4 years ago

Lulu84 commented 4 years ago

Dear David,

What determines which species tree root is kept for analysis when the following warning shows: " Multiple potential species tree roots were identified, only one will be analyzed"? (I have a

Is the "best option" selected somehow? If yes, is there a quantification or anything supporting a "rational" decision to keep the best rooted tree?

I am asking this because whatever test data I submit, the trees do follow perfectly the "true" or expected phylogeny.

Many thanks for your help! Lucienne

davidemms commented 4 years ago

Hi Lucienne

Yes, it applies an additional criteria to attempt to choose the best one. Thanks for asking about this, I'll give you the details here and then add it to the documentation.

It might help to refer to the STRIDE paper especially Figures 2 & 3 to help get a picture of what's going on. A. OrthoFinder applies the STRIDE method to find the root(s) which are in agreement with the largest number of well-supported gene duplication events (i.e. the parsimony method rather than the probabilistic model from the paper as this seemed to be the most accurate). If there is a tie, then you get the warning message you've observed and it applies a secondary method to find the 'best' one to use.

B. Choosing between multiple best roots according to STRIDE. We use two principles:

Method: 1.For each potential root, find the topological distance from the potential root to the closest of these supported clades.

  1. Select the potential root that is furthest from its respective closest supported clade. If there is more than one, pick the one with the largest branch length distance (i.e as a to resolve ties at the level of toplogical distance).

This ensures all supported clades (i.e. with evidence from gene duplications) are as far away as possible from the chosen root. This should minimise the possibility of any false positive or false negative ortholog assignments.

Updates to implement in OrthoFinder

All the best David