eggnogdb / eggnog-mapper

Fast genome-wide functional annotation through orthology assignment
http://eggnog-mapper.embl.de
GNU Affero General Public License v3.0
560 stars 105 forks source link

question about GOs retrieval in the result page #257

Closed jcleple33 closed 3 years ago

jcleple33 commented 3 years ago

Hi!

I don't understand why no GOs are listed in the results but several GO terms are associated with the OGs (see http://eggnogapi5.embl.de/nog_data/json/go_terms/3GBDI, http://eggnogapi5.embl.de/nog_data/json/go_terms/37WDW). an example below with these 2 sequences

Please, could you help me to understand this point?

Many thanks in advance!

################################

Qrob_P0000010.2 230 MSGPENWLQFYQQQNLSTSQMSDSTIVTTTVTSNSPIGSGNSSANTGHLSPEGRVSKPIR RRSRASRRTPTTLLNTDTTNFRAMVQQFTGGPSAPFASGAHHGAPPNFGFALGTNRQAHH HHPAMMLPPTGHGFHLQPQQQLLYPHHHHHQQGQQQYMFSLGNNSSHSTHHGQDHALLQR LGSSRPSSNMGSVVSDGFVIEGVGVPSSSQVPPASTVRPSNENRSNSFLF Qrob_P0000020.2 395 MQKMNLHSICISLSFLFAIFAVLPSTHIINSNPSFLEYLINFSLTFFSSSPITSTVITYN IEGHHKHHKKNKTINPCEDFSPDFPTDSDTTSYICVDRNGCCNFTTVQAAVNAIPDFSQN RTIIWINTGIYYEKVTVPSPKTNVTFQGQGFTSTAIAWNDTANSSHGTSFSGSVQVFSTN FIAKNISFMNLAPIPNPGDIGAQAVAIKISGDKAAFWGCGFFGAQDTLHDDKGRHYFRDC YIQGSIDFIFGNGRSLYENCLLISMAKPVDPGSKGINGALTAHGRTSQDENTGFAFTNCT VGGNGRVLLGRAWRPFARVVFANTSMSDIIAPEGWNDFNDPTRDKTIFFGEYNCSGPGAN MTMRLPYVLRLNDTQASPFLNVSFVDGDQWLQPYN ################################ module load system/Python-3.7.4;module load bioinfo/eggnog-mapper-2.0.2-rf1;emapper.py -i Qrob_PM1N_CDS_aa_20161004.fa --cpu 4 --output AllQrobNOG -m diamond --target_orthologs all ################################

Results : query_name seed_eggNOG_ortholog seed_ortholog_evalue seed_ortholog_score eggNOG OGs narr_og_name narr_og_cat narr_og_desc best_og_name best_og_cat best_og_desc Preferred_name GOs EC KEGG_ko KEGG_Pathway KEGG_Module KEGG_Reaction KEGG_rclass BRITE KEGG_TC CAZy BiGG_Reaction PFAMs Qrob_P0000010.2 3641.EOX93202 5.8e-41 174.5 2D220@1|root,2S4XE@2759|Eukaryota,37WDW@33090|Viridiplantae 37WDW@33090|Viridiplantae S VQ motif-containing protein 37WDW@33090|Viridiplantae S VQ motif-containing protein - - - - - - - - - - VQ Qrob_P0000020.2 3988.XP_002521065.1 3.5e-163 581.3 COG4677@1|root,2QUTX@2759|Eukaryota,37NG6@33090|Viridiplantae,3GBDI@35493|Streptophyta,4JJ3S@91835|fabids 4JJ3S@91835|fabids G pectinesterase 3GBDI@35493|Streptophyta G pectinesterase 3.1.1.11 ko:K01051 ko00040,ko01100,map00040,map01100 M00081 R02362 RC00460,RC00461 ko00000,ko00001,ko00002,ko01000 - - - Pectinesterase ################################

Cantalapiedra commented 3 years ago

Hi @jcleple33 ,

EggNOG-mapper is discarding by default GO terms annotated as ND or IEA (http://geneontology.org/docs/guide-go-evidence-codes/), and this may be the reason why those GOs are not in your results.

Nonetheless, I included a new option for the --go_evidence parameter (--go_evidence all), which hopefully will retrieve all the available GO terms, just in case you are interested on all of them. If yes, just pull again the repo.

Sorry for the inconveniences, I hope this is of help.

Best, Carlos

jcleple33 commented 3 years ago

Many thanks Carlos,

To recover all GOs I have downloaded all for each" narrowest OG names" corresponding to my transcripts: wget "http://eggnogapi5.embl.de/nog_data/json/go_terms/37WDW" etc...

I guess this should give similar results as --go_evidence all ? but may be "--go_evidence all" points to the "best_og_names" GOs, not the narrowest?

Best, Jean-Charles

Cantalapiedra commented 3 years ago

Hi Jean-Charles,

yes, I believe GO terms would come from orthologs found in the best OG. You could always perform the annotation with --tax_scope narrowest to force the best OG coincide with the narrowest OG.

Glad to help anytime you need.

Best, Carlos

Cantalapiedra commented 3 years ago

Please, re-open or re-issue if needed.

Best, Carlos