apetkau / orthomcl-pipeline

Automates running of OrthoMCL software from http://orthomcl.org/common/downloads/software/v2.0/
80 stars 36 forks source link

how to understand the groups.txt file #34

Closed U201412486 closed 4 years ago

U201412486 commented 4 years ago

Hi, Merry Christmas! In the userguide at https://orthomcl.org/common/downloads/software/v2.0/UserGuide.txt, it say The groups.txt file contains the groups created by clustering the pairs with the MCL program. I think it means that The groups.txt include coorthologs,inparalogs and orthologs.So how can I separate coorthologs,inparalogs and orthologs from groups.txt file ? best~ sun,

apetkau commented 4 years ago

Thank you. Merry Christmas to you too.

Yes, the groups.txt file contains both the orthologs and (recent) paralogs identified by OrthoMCL. The file looks something like:

group_1: SpeciesA|geneA1 SpeciesB|geneA1 SpeciesB|geneA2 

Here, SpeciesB|geneA1 and SpeciesB|geneA2 are paralogs (they are both very similar genes in SpeciesB and so likely arise from a duplication event). Likewise SpeciesA|geneA1 is likely orthologous to SpeciesB|geneA1 and SpeciesB|geneA2.

You can find an illustration of a single OrthoMCL group in Figure 3 from the OrthoMCL paper (https://genome.cshlp.org/content/13/9/2178/F3.expansion.html).

So, to separate orthologs from recent paralogs you can look for entries in the groups file where the species portion of the name is the same (representing duplicate genes in the same species).

As for making more specific classifications (inparalogs, etc) I don't know if this is possible from the OrthoMCL results. I suspect you would need to integrate the OrthoMCL results with phylogenetic information about the species you are examining. Unfortunately, this is at the limit of my knowledge so I don't think I can give you a better answer.

I hope this helps.

U201412486 commented 4 years ago

Thank you for your answer.It helps me.