eggnogdb / eggnog-mapper

Fast genome-wide functional annotation through orthology assignment
http://eggnog-mapper.embl.de
GNU Affero General Public License v3.0
569 stars 106 forks source link

Understanding values in annotation file #261

Closed jfourquet2 closed 3 years ago

jfourquet2 commented 3 years ago

Hi, To be sure of my undertanding of fields in the output annotation file, are the elements into "Transferred annotations fields" are all GOs terms, EC numbers, etc of all OG presents into the column "eggNOG OGs"? Thanks in advance for your answer !

Cantalapiedra commented 3 years ago

Hi @jfourquet2 ,

could you specify which version of eggnog-mapper are you using, and also an example of a results you asking about?

Thank you.

Best, Carlos

jfourquet2 commented 3 years ago

Hi Carlos, I'm using the version 2.0.2-rf1 of eggNOG-mapper and for exemple I have this result:

#query_name seed_eggNOG_ortholog    seed_ortholog_evalue    seed_ortholog_score eggNOG OGs  narr_og_name    narr_og_cat narr_og_desc    best_og_name    best_og_cat best_og_desc    Preferred_name  GOs EC  KEGG_ko KEGG_Pathway    KEGG_Module KEGG_Reaction   KEGG_rclass BRITE   KEGG_TC CAZy    BiGG_Reaction   PFAMs
SC1802-114574_CCGCGGTT-AGCGCTAG-BHLGV2DSXX_L0041.Prot_00008 877418.ATWV01000015_gene2702    1.3e-73 283.5   COG3850@1|root,COG3850@2|Bacteria   COG3850@2|Bacteria  T   phosphorelay sensor kinase activity COG3850@2|Bacteria  T   phosphorelay sensor kinase activity     -   2.7.13.3,4.6.1.1    ko:K01768,ko:K07673,ko:K07713   ko00230,ko02020,ko02025,ko04113,ko04213,map00230,map02020,map02025,map04113,map04213    M00471,M00499,M00695R00089,R00434   RC00295 ko00000,ko00001,ko00002,ko01000,ko01001,ko02022 -   -   -   4HB_MCP_1,AAA,AAA_2,AAA_5,CZB,Cache_1,Cache_3-Cache_2,DUF3365,DUF443,GAF,GAF_2,Guanylate_cyc,HAMP,HATPase_c,HD,HTH_8,Hemerythrin,HisKA,HisKA_3,MASE3,MCPsignal,NIT,PAS_4,PilJ,Sigma54_activ_2,Sigma54_activat,TarH,dCache_1,dCache_3

In the column eggNOG OGs I have for the first result line COG3850@1|root,COG3850@2|Bacteria COG3850@2|Bacteria T phosphorelay sensor kinase activity COG3850@2|Bacteria and I wanted to know if into the columns of functional annotation (GOs, EC, etc) I have all the GOs terms, EC numbers (etc) of the COGs into this eggNOG OGs column?

Cantalapiedra commented 3 years ago

Hi @jfourquet2 ,

not sure if I understand your question. I will try to answer. Besides the eggNOG OGs you have another column, best_og_name, which is the OG used to retrieve the orthologs from which annotation terms are finally obtained. So more specifically, the annotations you see should not come from COG3850@1|root and COG3850@2|Bacteria, but only from COG3850@2|Bacteria in this case.

I hope this makes sense.

Best, Carlos

jfourquet2 commented 3 years ago

Hi Carlos, Thanks a lot for your answer, it is very clear now ! I have also an other question: why are there different annotations separated by commas ? Best, Joanna

Cantalapiedra commented 3 years ago

Hi Joanna,

glad to help. There are different annotations separated by commas because there are different annotations in the eggnog DB for your Orthologous Group. For example, check COG3850 in http://eggnog5.embl.de/ under the "Functional profile" -> "Domains" tabs

I hope this makes sense.

Best, Carlos

jfourquet2 commented 3 years ago

Hi Carlos, Thanks a lot ! I've found here https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v1 the description of the output file for eggNOG v1. In this description it is notified for the output file that 11th column corresponds to best_OG|evalue|score: Best matching Orthologous Groups (only in HMM mode). Is the best_og_name column of eggNOG v2.0.2-rf1 corresponds to this old column of the old output file? I've not used HMM profiles (because of the version I've used) so I didn't well understand that... Concerning the annotations columns (EC, GOs, etc), are all these informations contained in eggNOG 5.0.1 database? You didn't use an other database than eggNOG? Because I wanted to know if for exemple PFAMs database is updated if this update is directly taking into account by upgrading eggNOG-mapper or if eggNOG must be updated after the update of PFAMs database and then I must update the version of eggNOG used into eggnog-mapped? Thanks a lot in advance ! Best, Joanna

Cantalapiedra commented 3 years ago

Hi Joanna,

sorry for the delay answering.

Hi Carlos, Thanks a lot ! I've found here https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v1 the description of the output file for eggNOG v1. In this description it is notified for the output file that 11th column corresponds to best_OG|evalue|score: Best matching Orthologous Groups (only in HMM mode). Is the best_og_name column of eggNOG v2.0.2-rf1 corresponds to this old column of the old output file? I've not used HMM profiles (because of the version I've used) so I didn't well understand that...

It should be conceptually the same. However, when using HMMER the first hit is in fact an OG. From that OG, the query is realigned to the OG members, and the best hit is used as seed ortholog. Then, the next steps (finding the other OGs in the hierarchy, deciding which is best, etc) is the same. I guess the "only in HMM mode" was because that evalue and score from a hit to a OG is only obtained using HMMER. With diamond the evalue and score are from the alignment to the seed ortholog.

Concerning the annotations columns (EC, GOs, etc), are all these informations contained in eggNOG 5.0.1 database? You didn't use an other database than eggNOG? Because I wanted to know if for exemple PFAMs database is updated if this update is directly taking into account by upgrading eggNOG-mapper or if eggNOG must be updated after the update of PFAMs database and then I must update the version of eggNOG used into eggnog-mapped? Thanks a lot in advance !

Yes, all the annotations are from the eggNOG 5.0.1 database, unless you are using --pfam_realign options, in which case the PFAM database is used directly. The PFAM database used for eggNOG 5.0.1 DB is PFAM31 currently if I recall correctly. We have plans to update all the annotations soon, but I cannot confirm when will happen. To update, so far the idea is that each eggnog-mapper version has associated an eggNOG database, and therefore when you update eggnog-mapper and you run "emapper.py --version" you should be warned if the eggnNOG DB version is not the one expected for the emapper.py version. In such case, you better run "download_eggnog_data.py" again to update the database.

Best, Joanna

I hope this makes sense.

Best, Carlos

jfourquet2 commented 3 years ago

Hi Carlos, Thank your for your detailed answer ! It helps a lot. Best, Joanna

Cantalapiedra commented 3 years ago

Glad to be of help. Best, Carlos