davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
673 stars 186 forks source link

Gene assigned to an orthogroup but not to a HOG #548

Open ryandward opened 3 years ago

ryandward commented 3 years ago

Is there a specific interpretation I should make if a gene is assigned to an orthogroup, but it does not appear it in a HOG? I've tried looking through the documentation, but I don't see much about HOGs in general.

Thank you very much, Ryan

davidemms commented 3 years ago

Hi Ryan

It means that OrthoFinder has rooted the gene tree, and found the tree shows that the gene was the single descendant from a gene duplication prior to the LCA, despite trying to find a more parsimonious interpretation. You can look at the corresponding tree in Resolved_Gene_Trees/together with the HOGs and gene duplication events identified for that tree in the N0.tsv and Gene_Duplication_Events/Duplications.tsv files to see how it has split the genes up. The algorithm is not perfect, but it increases accuracy considerably overall. However, it can be useful to see where these genes might fit in better by looking at the gene tree.

Thanks for the prompt on documentation for the HOGs, I'll update this.

Best wishes David

ryandward commented 3 years ago

Hi David, thank you for your answer.

I have been thinking about this reply for some time now. Do you think this could be the result of Horizontal Gene Transfer?

davidemms commented 3 years ago

Hi Ryan

Possibly....

I think the deciding factor for OrthoFinder will be that the gene appears on its own as a sister to a clade of genes which also contains a gene from that same species. If the HOG corresponds to a a node in the species tree with many species below it then that means there is a horizontal gene transfer or a gene duplication event. Gene duplication events are probably the most likely. It would mean that the gene was duplicated in the LCA of the HOG and then lost in all species but one for one of the duplicate branches. That's easily achieved (with few gene loss events being required) if the species it survived in is one of the first to diverge. It becomes less likely if the species is deeply nested in the clade as it would require a number of independent gene loss events. Nevertheless, this will occur too some of the time.

Horizontal gene transfer events are also possible, I guess it would indicate that the gene came from a species who's gene would sit in the position indicated by the position of the gene within the gene tree.

Best wishes David