UNF-PIPE / Tha-pipe

A fylogenetics pipeline
9 stars 1 forks source link

Find paralogs from tree file #10

Closed pappewaio closed 11 years ago

pappewaio commented 12 years ago

One step in our curation pipeline is to find probable paralogs. One software that could be useful is http://www.bioperl.org/wiki/HOWTO:Trees

simfor commented 12 years ago

To begin with, it should do the following Pseudocode:

if(ortholog){ Keep the one with the shortest branch } else{ Tell the user that we have a potential paralog }

It gets a bit more complicated when there are more than 2 copies of the gene/protein in one species. If for example two of them end up together in the tree while a third one is "off", the two are probably orthologs to the ones in the other species and the third is not. This should be noted, but how to handle it should be left to the user.

simfor commented 12 years ago

The sub findParalogs detects paralogs in the easy case when there are 2 copies of a species in the tree. It will return an array with the names of the paralogs.

simfor commented 11 years ago

The two bioperl-functions get_lca (http://www.bioperl.org/wiki/Least_common_ancestor) and get_all_Descendents (http://www.bioperl.org/wiki/HOWTO:Trees) should solve it.

If, for instance, there are n human homologs of the gene of interest in the tree: Pseudocode: @potParalogs = All human nodes LCA = get_lca(@potParalogs) @children = get_all_Descendents(LCA)

if(all elements in @children == "human"){ HURRA! } else{ PARALOG! }

simfor commented 11 years ago

This is given that the two functions work as we hope :)

jowkar commented 11 years ago

The pseudo-code above was implemented and the resulting function verified to work (with the help of the test findParalogs.t). What remains is to throw a warning in Main.pl when paralogs are detected.