Open marchoeppner opened 2 years ago
Hi @marchoeppner ,
thank you for reporting this. Could you provide the specific example with the GO terms and those you obtain from Ensembl?
Best, Carlos
Certainly.
Just randomly picking out one example where the list of GO terms seems excessively long.
This the "decorated" mRNA for the locus shown below:
ptg000007l EVM mRNA 20166438 20178559 . + . ID=evm.model.ptg000007l.513;Parent=evm.TU.ptg000007l.513;Name=TTC39C;em_target=144197.XP_008275783.1;em_score=1132.0;em_evalue=0.0;em_tcov=100.0;em_OGs=KOG3783@1|root,KOG3783@2759|Eukaryota,38B40@33154|Opisthokonta,3BFI6@33208|Metazoa,3CV0M@33213|Bilateria,489B0@7711|Chordata,490YD@7742|Vertebrata,49ZC9@7898|Actinopterygii;em_COG_cat=S;em_desc=Tetratricopeptide repeat domain 39C;em_max_annot_lvl=33208|Metazoa;em_Preferred_name=TTC39C;em_PFAMs=DUF3808;Ontology_term=GO:0006996,GO:0007275,GO:0007368,GO:0007389,GO:0007423,GO:0007507,GO:0008150,GO:0009653,GO:0009790,GO:0009799,GO:0009855,GO:0009887,GO:0009987,GO:0016043,GO:0022607,GO:0030030,GO:0030031,GO:0032474,GO:0032501,GO:0032502,GO:0042471,GO:0042472,GO:0043583,GO:0044085,GO:0044782,GO:0048513,GO:0048562,GO:0048568,GO:0048598,GO:0048731,GO:0048839,GO:0048840,GO:0048856,GO:0060271,GO:0061371,GO:0070925,GO:0071840,GO:0072359,GO:0090596,GO:0120031,GO:0120036
This is the orthologous gene from a fish species in EnsEMBL (4 GO terms in total):
And this is the locus in question in WebApollo:
Protein sequence for this mRNA:
evm.model.ptg000007l.513 MAGPEQSQQQQQVEEKAEHIDDAEMALQGINMLLNNGFKESDELFRRYRTQSPLMSFGASFVSFLNAMMT FEEEKMQTACDDLRTTEKLCESDSAGVIETIRNKIKKSMDSQRSGVVVIDRLQRQIIVADCQVYLAVLSF VKQELSAYIKGGWILRKAWKMYNKCHSDISQLQESCQRRSSGNQESLSADNANHNAPVENAVTAEALDRL KGSVSFGYGLFHLCISMVPPHLLKIINLLGFPGDRLQGLSSLMYASESKDMKAPLATLALLWYHTVVLPF FALDGSDTHEGLLEAKAILQRKSVVYPNSSLFMFFKGRVQRLECHINSALACFHDALELASDQREIQHVC LYEIGWCSMIEMNFEDAYRAFERLKNESRWSQCYYAYLTGVCQGAAGDLDGASGVFKDVQKLFKRKNNQI EQFAVKRAERLRKISPTRELCILGVIEVLYLWKALPNCSSSKLQIMNQVLQSLDEASCRGLKHLLLGAIH KCHGNVRDALQSFQLAARDEYGRQINSYVQPYAVYELGCVLLGKPETVGKGRSLLLQAKEDFTGYDFENR LHVRIHSALASLKEVVPQ
Yes, thank you!
It seems to me that it is indeed reporting the whole GO ontology (as you previously suggested). For example, https://www.ebi.ac.uk/QuickGO/term/GO:0032474
Hopefully in future versions of the database we could improve this, to try to report only the most meaningful GO terms.
Best, Carlos
Thanks for the quick feedback, Yes, it would certainly be desirable to filter down the list of terms to the relevant ones (as the rest of the graph is implicitly included anyway). I would expect that's what most people are looking for anyway. Closing this for now and looking forward to a "fix".
Thank you @marchoeppner
Reopening this as there has been no movement so far.
I am new with Bioinformatics, I used eggnog to annotate a group of protein sequences and like you I have several GO references for a same query I checked on https://www.ebi.ac.uk/QuickGO/term/GO:0000122 and indeed it sometimes traces the pattern not in full...
The results obtained with eggnog are not very well explained, how do you interpret these results? With those GOs results how can I get a Nice graph like those seen in publication?
Is there documentation that explains what each result corresponds to? for exemple Description is that refer to the COG_category description?
query | seed_ortholog | evalue | score | eggNOG_OGs | max_annot_lvl | COG_category | Description | Preferred_name | GOs | EC | KEGG_ko | KEGG_Pathway | KEGG_Module | KEGG_Reaction | KEGG_rclass | BRITE | KEGG_TC | CAZy | BiGG_Reaction | PFAMs
Hi, seems like you are looking for the documentation of output formats: https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.5-to-v2.1.8#user-content-Output_fields
hey @marchoeppner @Cantalapiedra @timase2021 I wonder how you deal with this GO redundancy issue in the end? Cause I am having the same struggle currently... Thx for your reply in advance. Lan
From our side, we couldn't implement a solution for it yet. You may need to rely on external tools parsing and processing lists of GO terms. Sorry for the inconveniences.
Best, Carlos
Hi,
I have been trying to incorporate Eggnog Mapper into a genome annotation pipeline ; and find that when using the "Metazoan" reference database on an annotation of a fish genome, I sometimes get hundreds of GO terms attached to a given mRNA. The model looks "sane" and the orthologous gene in another fish species in EnsEMBL has 5 or 6 GO terms.
So I suspect what I am seeing is somehow "wrong", or maybe Eggnog Mapper attaches the entirety of the GO graph per "tip" to each mRNA model? I would have expected maybe a handful of terms. The documentation does not elaborate on this, as far as I can tell. Maybe someone could clarify how this is supposed to work...
Many thanks, Marc