Open Calvin2077 opened 1 year ago
Hi @Calvin2077 ,
EggNOG OGs and KOs don't have a 1:1 relationship. I guess that is why you obtain multiple KO identifiers for some of your proteins.
Regarding using all of them or not, I would say it will depend on your analysis and your goals.
For instance, if you are going to perform an enrichment analysis, could be fine to keep them all, in my opinion.
Choosing one of the KOs may require more analyses. Is not DRAM yielding a single KO for your queries?
You could also try to narrow the annotation of those queries for which you want to reduce the number of KOs, maybe using a different --tax_scope
or --target_taxa
. A purely homology-based approach using the eggNOG data would be to use the seed ortholog of the queries, and search its KOs in the eggnog.db
database which is downloaded with eggnog-mapper. Also, you may reannotate those queries with other tools (not sure if KofamKOALA or similar).
I hope this is of help.
Best, Carlos
Hello Carlos,
Thank you very much for your help and suggestions it is much appreciated.
In regards to what I specifically, aim to do is I am conducting a comparative and functional analysis on 57 archaea species focusing primarily on their metabolism. Therefore do you think it would be okay to keep all the KOs for an gene?
Hi @Calvin2077 ,
I am afraid that I don't feel really qualified to answer your question properly. Also there is information that only you have and only you can assess. That being said, among different tests, I guess that you may perform a comparison using all the KOs and try to check whether the results make sense. For instance, are there KOs which you already know should be present or absent from one/some/all of your archaea genomes? Are there genes with multiple KOs for which you already know if there is a single valid KO? Did you try to identify the orthologous genes among your archaea species? Can you validate/filter/curate KOs based on other annotation sources?
Sorry to not be of much help with this.
Best, Carlos
Hi, Calvin. Could you figure this out? I will be doing the same kind of analysis and I am still unsure whether I should use all GO terms I get for one gene for my enrichment analysis?
Hello everyone,
I am using Eggnog Mapper to functionally annotate some archaea proteomes (genomes that were annotated within RAST + DRAM).
However, when I look at the results some of my proteins have multiple KO identifiers attached to them, each identifier is different and corresponds to a different proteins name. For example, one transporter gene has been given five KO identifier each with a different name and substrate
Therefore is there a way to choose which KO identifier to use or accept or do I accept them all?
Thus if someone could please help me it would be much appreciated please and thank you.