eggnogdb / eggnog-mapper

Fast genome-wide functional annotation through orthology assignment
http://eggnog-mapper.embl.de
GNU Affero General Public License v3.0
560 stars 105 forks source link

Similar OGs defined at different taxonomic levels #376

Open jimmyliu1326 opened 2 years ago

jimmyliu1326 commented 2 years ago

Hello,

I am slightly confused on how to interpret queries assigned to OGs defined at different taxonomic levels as shown below:

#query  eggNOG_OGs  Preferred_name  COG_category    GOs EC  KEGG_ko KEGG_Pathway    KEGG_Module KEGG_Reaction   KEGG_rclass BRITE   KEGG_TC BiGG_Reaction
CDS1    COG0463@1|root,COG0463@2|Bacteria,1MWE5@1224|Proteobacteria,1RPCE@1236|Gammaproteobacteria  gtrB    M   GO:0003674,GO:0003824,GO:0005575,GO:0005623,GO:0005886,GO:0016020,GO:0016021,GO:0016740,GO:0016757,GO:0031224,GO:0044425,GO:0044464,GO:0071944  -   ko:K20534   -   -   -   -   ko00000,ko01000,ko01005,ko02000 4.D.2.1.9   -
CDS2    COG0463@1|root,COG0463@2|Bacteria,1MWE5@1224|Proteobacteria,1RPCE@1236|Gammaproteobacteria,3ZN44@590|Salmonella gtrB    M   GO:0003674,GO:0003824,GO:0005575,GO:0005623,GO:0005886,GO:0016020,GO:0016021,GO:0016740,GO:0016757,GO:0031224,GO:0044425,GO:0044464,GO:0071944  -   ko:K20534   -   -   -   -   ko00000,ko01000,ko01005,ko02000 4.D.2.1.9   -
CDS3    COG0463@1|root,COG0463@2|Bacteria,1MWE5@1224|Proteobacteria,1RPCE@1236|Gammaproteobacteria,41F9F@629|Yersinia   gtrB    M   -   -   ko:K20534   -   -   -   -   ko00000,ko01000,ko01005,ko02000 4.D.2.1.9   -

Since these OGs intersect at the taxon level: Gammaproteobacteria with near-identical functional annotations, would these queries be considered orthologues?

Cantalapiedra commented 2 years ago

Hi @jimmyliu1326 ,

If you look at the Gammaproteobacteria level, the OG of those 3 hits is 1RPCE, so yes, your queries seem to hit proteins in the same OG. This does not translate directly to your queries being orthologous, though.

Best, Carlos

Chiamh commented 2 years ago

Thanks for developing eggnog mapper v2! Just to make sure I have the correct understanding, why is it that this "does not translate directly to your queries being orthologous?" Is it because we have no information of the taxonomic origins of CDS1-3 in this example?

Cantalapiedra commented 2 years ago

Hi @Chiamh ,

Thank you for your kind words.

Regarding the queries being orthologous, I just meant that the fact that the 3 of them map to a group of orthologous genes doesn't translate into them being orthologous to each other (i.e. derived from speciation events). For example, CDS1 and CDS2 could be paralogs (derived from a duplication event), even if they are from different species (they could have been duplicated before speciation events). To check whether CDS1-3 are orthologous you may need to actually study their phylogenetic evolution or, at least, their comparison at the genome scale (bi-directional best hits or other methods).

At least this is what I understand, to the best of my knowledge. I hope it makes sense to you.

Best, Carlos

Chiamh commented 2 years ago

Yes that makes sense. Thanks a lot for the clear explanation!

Cantalapiedra commented 2 years ago

Glad to be of help!