Closed PedroMTQ closed 5 years ago
if the column "EC numbers" is empty, no predictions are made for those entries. Should those queries really have EC numbers? if so, we could debug why there are not annotated as such.
Yes, I ran it against prokka. Here's a sample of both outputs for the same protein sequences: prokka:
LBKOJGCJ_00008 CDS 804 echA8_1 4.2.1.17 COG1024 putative enoyl-CoA hydratase echA8 LBKOJGCJ_00009 CDS 1449 aam_1 3.5.1.13 COG0154 Acylamidase LBKOJGCJ_00010 CDS 654 epsM 2.3.1.- COG0110 Putative acetyltransferase EpsM
Eggnog mapper:
LBKOJGCJ_00008 1229780.BN381_290012 9.1e-147 526.2 Actinobacteria Bacteria 2GJ1A@201174,COG1024@1,COG1024@2 NA|NA|NA I Enoyl-CoA hydratase LBKOJGCJ_00009 1229780.BN381_290013 4.9e-268 929.9 Actinobacteria Bacteria 2GKPZ@201174,COG0154@1,COG0154@2 NA|NA|NA J Belongs to the amidase family LBKOJGCJ_00010 1229780.BN381_290014 1.7e-58 232.6 Actinobacteria Bacteria 2IDUP@201174,COG0110@1,COG0110@2 NA|NA|NA GM sugar O-acyltransferase, sialic acid O-acetyltransferase NeuD family
And here are the sequences:
LBKOJGCJ_00008 putative enoyl-CoA hydratase echA8 MKDADGLYGDFTGFGVDRPADGVLRLTLDAPGLNAVDADAHRSLADVWRVIDRDPDTRVA LIRGAGKGFSAGGSFELLDEIMADRAARTRVLNEARDLVWGIIDCSKPVVSAIHGPAVGA GLVAALLADVSVAARSAKIIDGHTRLGVAAGDHAAVAWPLLCGMAKAKYHLLTNRPLSGE EAERIGLVSLCVDDDAVQDEAMSIATDLAAGSAEAIAFTKHTLNHHYRSAGPAFDASLYA EFYGFGGPDAREGLASHREKRSPNFGG LBKOJGCJ_00009 Acylamidase MGLTTWGEPLSGIEQPTAIEISDGVRKGELSAREVVDDYLGRIDAGNGALNAFVHVDAAG ARDQADRVDARVAAGEDPGPFAGVPFGVKDLEHCAGMPTSHGSTVYAGRGPVAADSIHVA RLRAAGGVPVGKTAAPEFGTLSFTRTLAFGVTTSPWGEGRTPGGSSGGSAAAVAAGLVPV STASDGGGSTRIPASFAGLVGMKPSHGRIPIEGPSGSQTAVAGLLTTTVAEAARHLDVTA GPDARDRLSLPSTDLNYCNLIETLETAGLRARWSPDLGFGVTDPEVESLCRSAAEELADA AGLAVDEGVVDLGDPVRLWFQAGAADLWLSLEPGMWPQLADDFTPFVRRGLEMTEELAMP RYADTLRLREDLQDHMAALFDEVDVVLCPTTAVAAFADKGPPPSVIAGQELGMGMATPYT MPANLCWNPAVSVPAGLTGDGLPVGLQIVTQRHRDEVPLRLARILEQVRPWPRHAPGAGA SS LBKOJGCJ_00010 Putative acetyltransferase EpsM MVIGASGHARCVIDAARAGSTGEVVAVADDDVVPTAREVLGVPVVGGSDSVGEWWSEGRI DGVVVGIGDNDTRMVVVERLLAIEPSLRFSTVAHPTASIAASARLGDGAVVLAGASVGPQ ASVGAHALLGAQANLDHDTVLSEGASLGPGALIGGAARIGCASVVGIGAVVRHGLTIGNH SVLGAGAVLTRDLPDGVVAWGAPARIQRSREPGERYL
Of course, here is the full CDS from prodigal which I annotated against prokka and eggnog mapper
thanks, we will look into some examples
I just ran eggnog-mapper online with your three example sequences (default params), and it seems to produce KEGG and EC number annotations: http://eggnog-mapper.embl.de/job_status?jobname=MM_7ref7n89
On Tue, 10 Sep 2019 at 20:58, Pedro Queirós notifications@github.com wrote:
Microthrix_parvicella.txt https://github.com/eggnogdb/eggnog-mapper/files/3597315/Microthrix_parvicella.txt
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eggnogdb/eggnog-mapper/issues/156?email_source=notifications&email_token=AABH6STIWE5ZNXTOATGNSBTQI7U3VA5CNFSM4IVK6LNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6MEUUI#issuecomment-530074193, or mute the thread https://github.com/notifications/unsubscribe-auth/AABH6SXLQXGUXTXMET63A7TQI7U3VANCNFSM4IVK6LNA .
Thanks, I just tried as well and it seems to be working. Actually, I was trying to convert the current eggnog from python 2.7 to 3.7 (to get everything working in the same environment). Since I mostly changed the syntax for the print function I assumed the annotation should have the EC numbers (since it runs all the way). I read through the code but could you advise me which part of the code could be causing the missing ECs?
is not the code, but the lack of EC numbers associated to the inferred orthologs of each query (which come from KEGG modules). This has to do with the phylogenetic trees scanned, and the inferred duplication events. From the three cases you sent, I manually checked the last case (without EC) by blasting against KEGG. Indeed, the top 10 best hits in KEGG do not have an EC number associated, so that would explain the result.
is not the code, but the set of orthologs inferred from the precomputed trees, plus their annotations from the KEGG database.
On Wed, 11 Sep 2019 at 13:48, Pedro Queirós notifications@github.com wrote:
Thanks, I just tried as well and it seems to be working. Actually, I was trying to convert the current eggnog from python 2.7 to 3.7 (to get everything working in the same environment). Since I mostly changed the syntax for the print function I assumed the annotation should have the EC numbers (since it runs all the way). I read through the code but could you advise me which part of the code could be causing the missing ECs?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eggnogdb/eggnog-mapper/issues/156?email_source=notifications&email_token=AABH6SU5VKXQYBRFVHCK5ODQJDLH5A5CNFSM4IVK6LNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6OGXJA#issuecomment-530344868, or mute the thread https://github.com/notifications/unsubscribe-auth/AABH6STYAW27BY4GCWSKLALQJDLH5ANCNFSM4IVK6LNA .
I tried running eggnog and am not obtaining any EC numbers in all the aligned queries (above 1000). The command I ran: ./emapper.py -i proteins.faa -o annotation_output --scratch_dir /output_eggnog_mapper/ --temp_dir /output_eggnog_mapper/ --keep_mapping_files -m diamond
Here's a sample of the output.annotations file: