eggnogdb / eggnog-mapper

Fast genome-wide functional annotation through orthology assignment
http://eggnog-mapper.embl.de
GNU Affero General Public License v3.0
567 stars 106 forks source link

Question – multiple KO IDs for one gene #462

Open Calvin2077 opened 1 year ago

Calvin2077 commented 1 year ago

Hello everyone,

I am using Eggnog Mapper to functionally annotate some archaea proteomes (genomes that were annotated within RAST + DRAM).

However, when I look at the results some of my proteins have multiple KO identifiers attached to them, each identifier is different and corresponds to a different proteins name. For example, one transporter gene has been given five KO identifier each with a different name and substrate

Therefore is there a way to choose which KO identifier to use or accept or do I accept them all?

Thus if someone could please help me it would be much appreciated please and thank you.

Cantalapiedra commented 1 year ago

Hi @Calvin2077 ,

EggNOG OGs and KOs don't have a 1:1 relationship. I guess that is why you obtain multiple KO identifiers for some of your proteins.

Regarding using all of them or not, I would say it will depend on your analysis and your goals. For instance, if you are going to perform an enrichment analysis, could be fine to keep them all, in my opinion. Choosing one of the KOs may require more analyses. Is not DRAM yielding a single KO for your queries? You could also try to narrow the annotation of those queries for which you want to reduce the number of KOs, maybe using a different --tax_scope or --target_taxa. A purely homology-based approach using the eggNOG data would be to use the seed ortholog of the queries, and search its KOs in the eggnog.db database which is downloaded with eggnog-mapper. Also, you may reannotate those queries with other tools (not sure if KofamKOALA or similar).

I hope this is of help.

Best, Carlos

Calvin2077 commented 1 year ago

Hello Carlos,

Thank you very much for your help and suggestions it is much appreciated.

In regards to what I specifically, aim to do is I am conducting a comparative and functional analysis on 57 archaea species focusing primarily on their metabolism. Therefore do you think it would be okay to keep all the KOs for an gene?

Cantalapiedra commented 1 year ago

Hi @Calvin2077 ,

I am afraid that I don't feel really qualified to answer your question properly. Also there is information that only you have and only you can assess. That being said, among different tests, I guess that you may perform a comparison using all the KOs and try to check whether the results make sense. For instance, are there KOs which you already know should be present or absent from one/some/all of your archaea genomes? Are there genes with multiple KOs for which you already know if there is a single valid KO? Did you try to identify the orthologous genes among your archaea species? Can you validate/filter/curate KOs based on other annotation sources?

Sorry to not be of much help with this.

Best, Carlos

LadaJov commented 1 year ago

Hi, Calvin. Could you figure this out? I will be doing the same kind of analysis and I am still unsure whether I should use all GO terms I get for one gene for my enrichment analysis?