eggnogdb / eggnog-mapper

Fast genome-wide functional annotation through orthology assignment
http://eggnog-mapper.embl.de
GNU Affero General Public License v3.0
560 stars 105 forks source link

No EC numbers #156

Closed PedroMTQ closed 5 years ago

PedroMTQ commented 5 years ago

I tried running eggnog and am not obtaining any EC numbers in all the aligned queries (above 1000). The command I ran: ./emapper.py -i proteins.faa -o annotation_output --scratch_dir /output_eggnog_mapper/ --temp_dir /output_eggnog_mapper/ --keep_mapping_files -m diamond

Here's a sample of the output.annotations file:

query_name seed_eggNOG_ortholog seed_ortholog_evalue seed_ortholog_score best_tax_level Preferred_name GOs EC KEGG_ko KEGG_Pathway KEGG_Module KEGG_Reaction KEGG_rclass BRITE KEGG_TC CAZy BiGG_Reaction

LBKOJGCJ_00002 1229780.BN381_290004 9.4e-40 169.1 Actinobacteria Bacteria 2GRC8@201174,COG2161@1,COG2161@2 NA|NA|NA D Antitoxin component of a toxin-antitoxin (TA) module

LBKOJGCJ_00003 1068978.AMETH_0526 2e-43 183.3 Actinobacteria Bacteria 29YB2@1,2IJK8@201174,30K5A@2 NA|NA|NA S Uncharacterised nucleotidyltransferase

LBKOJGCJ_00005 1380393.JHVP01000015_gene4217 5e-22 110.2 Frankiales Bacteria 2GJQ7@201174,4ERY6@85013,COG0656@1,COG0656@2 NA|NA|NA S PFAM aldo keto reductase

jhcepas commented 5 years ago

if the column "EC numbers" is empty, no predictions are made for those entries. Should those queries really have EC numbers? if so, we could debug why there are not annotated as such.

PedroMTQ commented 5 years ago

Yes, I ran it against prokka. Here's a sample of both outputs for the same protein sequences: prokka:

LBKOJGCJ_00008 CDS 804 echA8_1 4.2.1.17 COG1024 putative enoyl-CoA hydratase echA8 LBKOJGCJ_00009 CDS 1449 aam_1 3.5.1.13 COG0154 Acylamidase LBKOJGCJ_00010 CDS 654 epsM 2.3.1.- COG0110 Putative acetyltransferase EpsM

Eggnog mapper:

LBKOJGCJ_00008 1229780.BN381_290012 9.1e-147 526.2 Actinobacteria Bacteria 2GJ1A@201174,COG1024@1,COG1024@2 NA|NA|NA I Enoyl-CoA hydratase LBKOJGCJ_00009 1229780.BN381_290013 4.9e-268 929.9 Actinobacteria Bacteria 2GKPZ@201174,COG0154@1,COG0154@2 NA|NA|NA J Belongs to the amidase family LBKOJGCJ_00010 1229780.BN381_290014 1.7e-58 232.6 Actinobacteria Bacteria 2IDUP@201174,COG0110@1,COG0110@2 NA|NA|NA GM sugar O-acyltransferase, sialic acid O-acetyltransferase NeuD family

And here are the sequences:

LBKOJGCJ_00008 putative enoyl-CoA hydratase echA8 MKDADGLYGDFTGFGVDRPADGVLRLTLDAPGLNAVDADAHRSLADVWRVIDRDPDTRVA LIRGAGKGFSAGGSFELLDEIMADRAARTRVLNEARDLVWGIIDCSKPVVSAIHGPAVGA GLVAALLADVSVAARSAKIIDGHTRLGVAAGDHAAVAWPLLCGMAKAKYHLLTNRPLSGE EAERIGLVSLCVDDDAVQDEAMSIATDLAAGSAEAIAFTKHTLNHHYRSAGPAFDASLYA EFYGFGGPDAREGLASHREKRSPNFGG LBKOJGCJ_00009 Acylamidase MGLTTWGEPLSGIEQPTAIEISDGVRKGELSAREVVDDYLGRIDAGNGALNAFVHVDAAG ARDQADRVDARVAAGEDPGPFAGVPFGVKDLEHCAGMPTSHGSTVYAGRGPVAADSIHVA RLRAAGGVPVGKTAAPEFGTLSFTRTLAFGVTTSPWGEGRTPGGSSGGSAAAVAAGLVPV STASDGGGSTRIPASFAGLVGMKPSHGRIPIEGPSGSQTAVAGLLTTTVAEAARHLDVTA GPDARDRLSLPSTDLNYCNLIETLETAGLRARWSPDLGFGVTDPEVESLCRSAAEELADA AGLAVDEGVVDLGDPVRLWFQAGAADLWLSLEPGMWPQLADDFTPFVRRGLEMTEELAMP RYADTLRLREDLQDHMAALFDEVDVVLCPTTAVAAFADKGPPPSVIAGQELGMGMATPYT MPANLCWNPAVSVPAGLTGDGLPVGLQIVTQRHRDEVPLRLARILEQVRPWPRHAPGAGA SS LBKOJGCJ_00010 Putative acetyltransferase EpsM MVIGASGHARCVIDAARAGSTGEVVAVADDDVVPTAREVLGVPVVGGSDSVGEWWSEGRI DGVVVGIGDNDTRMVVVERLLAIEPSLRFSTVAHPTASIAASARLGDGAVVLAGASVGPQ ASVGAHALLGAQANLDHDTVLSEGASLGPGALIGGAARIGCASVVGIGAVVRHGLTIGNH SVLGAGAVLTRDLPDGVVAWGAPARIQRSREPGERYL

PedroMTQ commented 5 years ago

Of course, here is the full CDS from prodigal which I annotated against prokka and eggnog mapper

Microthrix_parvicella.txt

jhcepas commented 5 years ago

thanks, we will look into some examples

jhcepas commented 5 years ago

I just ran eggnog-mapper online with your three example sequences (default params), and it seems to produce KEGG and EC number annotations: http://eggnog-mapper.embl.de/job_status?jobname=MM_7ref7n89

On Tue, 10 Sep 2019 at 20:58, Pedro Queirós notifications@github.com wrote:

Microthrix_parvicella.txt https://github.com/eggnogdb/eggnog-mapper/files/3597315/Microthrix_parvicella.txt

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eggnogdb/eggnog-mapper/issues/156?email_source=notifications&email_token=AABH6STIWE5ZNXTOATGNSBTQI7U3VA5CNFSM4IVK6LNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6MEUUI#issuecomment-530074193, or mute the thread https://github.com/notifications/unsubscribe-auth/AABH6SXLQXGUXTXMET63A7TQI7U3VANCNFSM4IVK6LNA .

PedroMTQ commented 5 years ago

Thanks, I just tried as well and it seems to be working. Actually, I was trying to convert the current eggnog from python 2.7 to 3.7 (to get everything working in the same environment). Since I mostly changed the syntax for the print function I assumed the annotation should have the EC numbers (since it runs all the way). I read through the code but could you advise me which part of the code could be causing the missing ECs?

jhcepas commented 5 years ago

is not the code, but the lack of EC numbers associated to the inferred orthologs of each query (which come from KEGG modules). This has to do with the phylogenetic trees scanned, and the inferred duplication events. From the three cases you sent, I manually checked the last case (without EC) by blasting against KEGG. Indeed, the top 10 best hits in KEGG do not have an EC number associated, so that would explain the result.

jhcepas commented 5 years ago

is not the code, but the set of orthologs inferred from the precomputed trees, plus their annotations from the KEGG database.

On Wed, 11 Sep 2019 at 13:48, Pedro Queirós notifications@github.com wrote:

Thanks, I just tried as well and it seems to be working. Actually, I was trying to convert the current eggnog from python 2.7 to 3.7 (to get everything working in the same environment). Since I mostly changed the syntax for the print function I assumed the annotation should have the EC numbers (since it runs all the way). I read through the code but could you advise me which part of the code could be causing the missing ECs?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eggnogdb/eggnog-mapper/issues/156?email_source=notifications&email_token=AABH6SU5VKXQYBRFVHCK5ODQJDLH5A5CNFSM4IVK6LNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6OGXJA#issuecomment-530344868, or mute the thread https://github.com/notifications/unsubscribe-auth/AABH6STYAW27BY4GCWSKLALQJDLH5ANCNFSM4IVK6LNA .