Closed jamesabbott closed 4 years ago
Hi James, could you share a couple of query sequences leading to that problem? that would help us debugging...
On Wed, 6 May 2020 at 22:32, James Abbott notifications@github.com wrote:
I've been using eggnog_mapper to carry out functional annotation of predicted metagenomic proteins, which are primarily prokaryotic in origin, however a number of the terms which my analysis identifies as being enriched in particular conditions are clearly from higher eukaryotes.
As an example, a protein which is identified as fliC (Flagellin) seems to have the GO terms appropriate from the eggnog entry http://eggnog5.embl.de/#/app/results#COG1344_datamenu which all make sense. It is also annotated as GO:0035681 (toll-like receptor 15 signaling pathway) which is decidedly eukaryotic, and not referenced in the above eggnog entry. As far as I can see, this term is being introduced since there is evidence of an interaction between bacterial flagella and the TOLL signalling pathway, but this term does not apply directly to the annotated protein. As a result of this kind of annotation, I am finding 'enrichment' of various eukaryotic receptor pathways (TOLL, WNT, ERBB) in my metagenomic samples.
Is there any way to restrict the terms reported to those applied directly to the protein function (i.e. in this case, those from COG1344), and not include those which seem to be inferred due to interaction? Many thanks, James
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/eggnogdb/eggnog-mapper/issues/200, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABH6SXU4RWVUKNHS2F72XTRQHCOPANCNFSM4M2YG2OA .
Hi Jaime,
Many thanks for the quick response. I’ve done some further tests comparing the output of my standalone installation with the online version, and it seems the online version is behaving differently
As an advantage, the following sequence is annotated as fliC:
2006_NODE_1409982_length_265_cov_0.904762_1 DIDLKKIDSTSLKLNSLTVSSNALNVSGTIDTVVAASAGSGSQVVSFAAAEVTKLNTANGTSLTASDLSLHEVQNASGAGTGTFVVKA
The output from our local installation with this sequence is as follows:
2006_NODE_1409982_length_265_cov_0.904762_1 381666.H16_B2360 7.8e-16 88.2 FLIC GO:0002218,GO:0002221,GO:0002224,GO:0002253,GO:0002376,GO:0002682,GO:0002684,GO:0002755,GO:0002757,GO:0002758,GO:0002764,GO:0005575,GO:0005576,GO:0005618,GO:0005622,GO:0005623,GO:0005886,GO:0006950,GO:0006952,GO:0006955,GO:0007154,GO:0007165,GO:0008150,GO:0009288,GO:0009987,GO:0016020,GO:0023052,GO:0030312,GO:0031347,GO:0031349,GO:0034134,GO:0034142,GO:0034146,GO:0035681,GO:0042597,GO:0043226,GO:0043228,GO:0043229,GO:0043232,GO:0044424,GO:0044464,GO:0044699,GO:0044700,GO:0044763,GO:0045087,GO:0045088,GO:0045089,GO:0048518,GO:0048583,GO:0048584,GO:0050776,GO:0050778,GO:0050789,GO:0050794,GO:0050896,GO:0051716,GO:0055040,GO:0065007,GO:0071944,GO:0080134 K02406 bactNOG[38] 05I03@bactNOG,0BBTQ@bproNOG,16PW1@proNOG,COG1344@NOG NA|NA|NA N Flagellin domain-containing protein
Whereas having run the sequence through the online version I get:
2006_NODE_1409982_length_265_cov_0.904762_1 381666.H16_B2360 3.7e-12 77.4 Burkholderiaceae fliC GO:0005575,GO:0005576,GO:0005623,GO:0009288,GO:0042995,GO:0043226,GO:0043228,GO:0044464 ko:K02406 ko02020,ko02040,ko04621,ko04626,ko05132,ko05134,map02020,map02040,map04621,map04626,map05132,map05134 ko00000,ko00001,ko02035 Bacteria 1K01X@119060,1MV1N@1224,2VJTA@28216,COG1344@1,COG1344@2 NA|NA|NA N Flagellin is the subunit protein which polymerizes to form the filaments of bacterial flagella
So the problem GO term (GO:0035681) is not present using the online version. I’ve added the additional arguments supported arguments used by the online version to try to replicate the analysis as closely as possible (although –query-cover and –subject-cover do not seem to be supported):
2006_NODE_1409982_length_265_cov_0.904762_1 381666.H16_B2360 7.8e-16 88.2 FLIC GO:0002218,GO:0002221,GO:0002224,GO:0002253,GO:0002376,GO:0002682,GO:0002684,GO:0002755,GO:0002757,GO:0002758,GO:0002764,GO:0005575,GO:0005576,GO:0005618,GO:0005622,GO:0005623,GO:0005886,GO:0006950,GO:0006952,GO:0006955,GO:0007154,GO:0007165,GO:0008150,GO:0009288,GO:0009987,GO:0016020,GO:0023052,GO:0030312,GO:0031347,GO:0031349,GO:0034134,GO:0034142,GO:0034146,GO:0035681,GO:0042597,GO:0043226,GO:0043228,GO:0043229,GO:0043232,GO:0044424,GO:0044464,GO:0044699,GO:0044700,GO:0044763,GO:0045087,GO:0045088,GO:0045089,GO:0048518,GO:0048583,GO:0048584,GO:0050776,GO:0050778,GO:0050789,GO:0050794,GO:0050896,GO:0051716,GO:0055040,GO:0065007,GO:0071944,GO:0080134 K02406 bactNOG[38] 05I03@bactNOG,0BBTQ@bproNOG,16PW1@proNOG,COG1344@NOG NA|NA|NA N Flagellin domain-containing protein
This still reports the additional GO terms. The most significant difference I can see in the outputs is the database version.
Any suggestions on how to resolve this would be greatly appreciated.
Best Regards James
From: Jaime Huerta-Cepas notifications@github.com Reply to: eggnogdb/eggnog-mapper reply@reply.github.com Date: Wednesday, 6 May 2020 at 21:39 To: eggnogdb/eggnog-mapper eggnog-mapper@noreply.github.com Cc: "James Abbott (Staff)" j.abbott@dundee.ac.uk, Author author@noreply.github.com Subject: Re: [eggnogdb/eggnog-mapper] Confusing GO annotations (#200)
Hi James, could you share a couple of query sequences leading to that problem? that would help us debugging...
On Wed, 6 May 2020 at 22:32, James Abbott notifications@github.com wrote:
I've been using eggnog_mapper to carry out functional annotation of predicted metagenomic proteins, which are primarily prokaryotic in origin, however a number of the terms which my analysis identifies as being enriched in particular conditions are clearly from higher eukaryotes.
As an example, a protein which is identified as fliC (Flagellin) seems to have the GO terms appropriate from the eggnog entry http://eggnog5.embl.de/#/app/results#COG1344_datamenu which all make sense. It is also annotated as GO:0035681 (toll-like receptor 15 signaling pathway) which is decidedly eukaryotic, and not referenced in the above eggnog entry. As far as I can see, this term is being introduced since there is evidence of an interaction between bacterial flagella and the TOLL signalling pathway, but this term does not apply directly to the annotated protein. As a result of this kind of annotation, I am finding 'enrichment' of various eukaryotic receptor pathways (TOLL, WNT, ERBB) in my metagenomic samples.
Is there any way to restrict the terms reported to those applied directly to the protein function (i.e. in this case, those from COG1344), and not include those which seem to be inferred due to interaction? Many thanks, James
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/eggnogdb/eggnog-mapper/issues/200, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABH6SXU4RWVUKNHS2F72XTRQHCOPANCNFSM4M2YG2OA .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/eggnogdb/eggnog-mapper/issues/200#issuecomment-624878446, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABIRXSPIMIZDJSAIHDPPI7LRQHDGZANCNFSM4M2YG2OA.
The University of Dundee is a registered Scottish Charity, No: SC015096
Dear James,
I have tested your sequence with a local current version and I got:
--output_dir tmp --data_dir . -m diamond --cpu 12 --go_evidence non-electronic --target_orthologs all --seed_ortholog_evalue 0.001 --seed_ortholog_score 60 ... 2006_NODE_1409982_length_265_cov_0.904762_1 381666.H16_B2360 3.7e-12 77.4 Bacteria Burkholderiaceae N Flagellin is the subunit protein which polymerizes to form the filaments of bacterial flagella COG1344@1|root,COG1344@2|Bacteria,1MV1N@1224 |Proteobacteria,2VJTA@28216|Betaproteobacteria,1K01X@119060|Burkholderiaceae fliC GO:0005575,GO:0005576,GO:0005623,GO:0009288,GO:0042995,GO:0043226,GO:0043228,GO:0044464 ko:K02406 ko02020,ko02040,ko04621,ko04626,ko05132,ko05134,map02020,map02040,map04621,map04626,map05132,map05134 ko00000,ko00001,ko02035
It looks equivalent to the one you got with the online version:
2006_NODE_1409982_length_265_cov_0.904762_1 381666.H16_B2360 3.7e-12 77.4 Burkholderiaceae fliC GO:0005575,GO:0005576,GO:0005623,GO:0009288,GO:0042995,GO:0043226,GO:0043228,GO:0044464 ko:K02406 ko02020,ko02040,ko04621,ko04626,ko05132,ko05134,map02020,map02040,map04621,map04626,map05132,map05134 ko00000,ko00001,ko02035 Bacteria 1K01X@119060,1MV1N@1224,2VJTA@28216 ,COG1344@1,COG1344@2 NA|NA|NA N Flagellin is the subunit protein which polymerizes to form the filaments of bacterial flagella
I guess the difference is, as you said, due to the DB being used.
Also, --query-cov and --subject-cov seem to be working for me with this example. I got results with --query-cov 10 --subject-cov 10 but no results with --query-cov 100 --subject-cov 100
Best, Carlos
El jue., 7 may. 2020 a las 14:33, James Abbott (notifications@github.com) escribió:
Hi Jaime,
Many thanks for the quick response. I’ve done some further tests comparing the output of my standalone installation with the online version, and it seems the online version is behaving differently
As an advantage, the following sequence is annotated as fliC:
2006_NODE_1409982_length_265_cov_0.904762_1 DIDLKKIDSTSLKLNSLTVSSNALNVSGTIDTVVAASAGSGSQVVSFAAAEVTKLNTANGTSLTASDLSLHEVQNASGAGTGTFVVKA
The output from our local installation with this sequence is as follows:
emapper version: emapper-1.0.3 emapper DB: 4.5.1
command: ./emapper.py -i
/cluster/db/jabbott/GO_enrichment/barley_filter/test.fa -o test --output_dir eggnog_annotations --scratch_dir /tmp/418.1.all.q --data_dir /media/ramdisk/418.1.all.q --database bact -m diamond --cpu 12 --go_evidence non-electronic
2006_NODE_1409982_length_265_cov_0.904762_1 381666.H16_B2360 7.8e-16 88.2 FLIC GO:0002218,GO:0002221,GO:0002224,GO:0002253,GO:0002376,GO:0002682,GO:0002684,GO:0002755,GO:0002757,GO:0002758,GO:0002764,GO:0005575,GO:0005576,GO:0005618,GO:0005622,GO:0005623,GO:0005886,GO:0006950,GO:0006952,GO:0006955,GO:0007154,GO:0007165,GO:0008150,GO:0009288,GO:0009987,GO:0016020,GO:0023052,GO:0030312,GO:0031347,GO:0031349,GO:0034134,GO:0034142,GO:0034146,GO:0035681,GO:0042597,GO:0043226,GO:0043228,GO:0043229,GO:0043232,GO:0044424,GO:0044464,GO:0044699,GO:0044700,GO:0044763,GO:0045087,GO:0045088,GO:0045089,GO:0048518,GO:0048583,GO:0048584,GO:0050776,GO:0050778,GO:0050789,GO:0050794,GO:0050896,GO:0051716,GO:0055040,GO:0065007,GO:0071944,GO:0080134 K02406 bactNOG[38] 05I03@bactNOG,0BBTQ@bproNOG,16PW1@proNOG,COG1344@NOG NA|NA|NA N Flagellin domain-containing protein
Whereas having run the sequence through the online version I get:
emapper version: emapper-1.0.3-35-g63c274b emapper DB: 2.0
command: ./emapper.py --cpu 10 -i
/data/shared/emapper_jobs/user_data/MM_lfrf1v9s/query_seqs.fa --output query_seqs.fa --output_dir /data/shared/emapper_jobs/user_data/MM_lfrf1v9s -m diamond -d none --tax_scope auto --go_evidence non-electronic --target_orthologs all --seed_ortholog_evalue 0.001 --seed_ortholog_score 60 --query-cover 20 --subject-cover 0 --override --temp_dir /data/shared/emapper_jobs/user_data/MM_lfrf1v9s
2006_NODE_1409982_length_265_cov_0.904762_1 381666.H16_B2360 3.7e-12 77.4 Burkholderiaceae fliC GO:0005575,GO:0005576,GO:0005623,GO:0009288,GO:0042995,GO:0043226,GO:0043228,GO:0044464 ko:K02406 ko02020,ko02040,ko04621,ko04626,ko05132,ko05134,map02020,map02040,map04621,map04626,map05132,map05134 ko00000,ko00001,ko02035 Bacteria 1K01X@119060,1MV1N@1224,2VJTA@28216 ,COG1344@1,COG1344@2 NA|NA|NA N Flagellin is the subunit protein which polymerizes to form the filaments of bacterial flagella
So the problem GO term (GO:0035681) is not present using the online version. I’ve added the additional arguments supported arguments used by the online version to try to replicate the analysis as closely as possible (although –query-cover and –subject-cover do not seem to be supported):
emapper version: emapper-1.0.3 emapper DB: 4.5.1
command: ./emapper.py -i
/cluster/db/jabbott/GO_enrichment/barley_filter/test.fa -o test --output_dir eggnog_annotations --scratch_dir /tmp/420.1.all.q --data_dir /media/ramdisk/420.1.all.q --database none -m diamond --cpu 12 --go_evidence non-electronic --target_orthologs all --seed_ortholog_evalue 0.001 --seed_ortholog_score 60 --override
2006_NODE_1409982_length_265_cov_0.904762_1 381666.H16_B2360 7.8e-16 88.2 FLIC GO:0002218,GO:0002221,GO:0002224,GO:0002253,GO:0002376,GO:0002682,GO:0002684,GO:0002755,GO:0002757,GO:0002758,GO:0002764,GO:0005575,GO:0005576,GO:0005618,GO:0005622,GO:0005623,GO:0005886,GO:0006950,GO:0006952,GO:0006955,GO:0007154,GO:0007165,GO:0008150,GO:0009288,GO:0009987,GO:0016020,GO:0023052,GO:0030312,GO:0031347,GO:0031349,GO:0034134,GO:0034142,GO:0034146,GO:0035681,GO:0042597,GO:0043226,GO:0043228,GO:0043229,GO:0043232,GO:0044424,GO:0044464,GO:0044699,GO:0044700,GO:0044763,GO:0045087,GO:0045088,GO:0045089,GO:0048518,GO:0048583,GO:0048584,GO:0050776,GO:0050778,GO:0050789,GO:0050794,GO:0050896,GO:0051716,GO:0055040,GO:0065007,GO:0071944,GO:0080134 K02406 bactNOG[38] 05I03@bactNOG,0BBTQ@bproNOG,16PW1@proNOG,COG1344@NOG NA|NA|NA N Flagellin domain-containing protein
This still reports the additional GO terms. The most significant difference I can see in the outputs is the database version.
Any suggestions on how to resolve this would be greatly appreciated.
Best Regards James
From: Jaime Huerta-Cepas notifications@github.com Reply to: eggnogdb/eggnog-mapper reply@reply.github.com Date: Wednesday, 6 May 2020 at 21:39 To: eggnogdb/eggnog-mapper eggnog-mapper@noreply.github.com Cc: "James Abbott (Staff)" j.abbott@dundee.ac.uk, Author < author@noreply.github.com> Subject: Re: [eggnogdb/eggnog-mapper] Confusing GO annotations (#200)
Hi James, could you share a couple of query sequences leading to that problem? that would help us debugging...
On Wed, 6 May 2020 at 22:32, James Abbott notifications@github.com wrote:
I've been using eggnog_mapper to carry out functional annotation of predicted metagenomic proteins, which are primarily prokaryotic in origin, however a number of the terms which my analysis identifies as being enriched in particular conditions are clearly from higher eukaryotes.
As an example, a protein which is identified as fliC (Flagellin) seems to have the GO terms appropriate from the eggnog entry http://eggnog5.embl.de/#/app/results#COG1344_datamenu which all make sense. It is also annotated as GO:0035681 (toll-like receptor 15 signaling pathway) which is decidedly eukaryotic, and not referenced in the above eggnog entry. As far as I can see, this term is being introduced since there is evidence of an interaction between bacterial flagella and the TOLL signalling pathway, but this term does not apply directly to the annotated protein. As a result of this kind of annotation, I am finding 'enrichment' of various eukaryotic receptor pathways (TOLL, WNT, ERBB) in my metagenomic samples.
Is there any way to restrict the terms reported to those applied directly to the protein function (i.e. in this case, those from COG1344), and not include those which seem to be inferred due to interaction? Many thanks, James
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/eggnogdb/eggnog-mapper/issues/200, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AABH6SXU4RWVUKNHS2F72XTRQHCOPANCNFSM4M2YG2OA>
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< https://github.com/eggnogdb/eggnog-mapper/issues/200#issuecomment-624878446>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/ABIRXSPIMIZDJSAIHDPPI7LRQHDGZANCNFSM4M2YG2OA>.
The University of Dundee is a registered Scottish Charity, No: SC015096
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/eggnogdb/eggnog-mapper/issues/200#issuecomment-625227333, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEIMQ3RKA53NSCJPKCHDWZTRQKTBVANCNFSM4M2YG2OA .
-- Carlos P. Cantalapiedra Post-doctoral researcher Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA) Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA) Campus de Montegancedo-UPM 28223-Pozuelo de Alarcón (Madrid) Spain
Thanks @Cantalapiedra for looking into this!
Besides DB updates, I don't think we can solve this kind of missannotations easily, as they seem to be inherited from sequence-based annotations in other DBs.
Note that, unless specifically requested, eggnog-mapper restricts the annotations so there are no cross-domain transfers. This is, a bacterial gene will never get annotations from euk. orthologs. The problem you found is that there was an euk. GO term bound to a bact. sequence, so eggnog-mapper cannot differentiate.
As far as I can see the earliest version of the database available is 4.5.0 - would you be able to make available an earlier database which hasn't pulled in these associated terms? The version 2.0 used by the online version doesn't have the problem annotation for this particular sequence, so may be a good starting point, and would make the output equivalent to having used the online version. Whether this would solve the wider problem or just this particular annotation I guess we have no way of knowing without running and comparing the full dataset. My dataset is pretty big, so would require >200 submissions of 100k sequences to actually run online, which is not really practical.
Thanks, James
Not sure if there is some confusion here regarding database versions. Version being used in the online version (which uses eggnog-mapper v2.0) is eggnog v5.0
El lun., 11 may. 2020 a las 10:09, James Abbott (notifications@github.com) escribió:
As far as I can see the earliest version of the database available is 4.5.0 - would you be able to make available an earlier database which hasn't pulled in these associated terms? The version 2.0 used by the online version doesn't have the problem annotation for this particular sequence, so may be a good starting point, and would make the output equivalent to having used the online version. Whether this would solve the wider problem or just this particular annotation I guess we have no way of knowing without running and comparing the full dataset. My dataset is pretty big, so would require >200 submissions of 100k sequences to actually run online, which is not really practical.
Thanks, James
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/eggnogdb/eggnog-mapper/issues/200#issuecomment-626544360, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEIMQ3UJERPC25ORFT465YDRQ6XEDANCNFSM4M2YG2OA .
-- Carlos P. Cantalapiedra Post-doctoral researcher Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA) Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA) Campus de Montegancedo-UPM 28223-Pozuelo de Alarcón (Madrid) Spain
I’m happy to admit I’m confused! I was going by what appears to be the database version reported by eggnogmapper:
Standalone:
Online:
So they both look to be emapper 1.0.3 (without the git hash for the standalone version), but the DB versions reported are different, and neither of them report 5.0. I used the conda build, which reports itself as ‘emapper-dd753fc’ with emapper.py –version. I’m not sure all this versioning is correct however, If I run
emapper.py --query-cov 10 --subject-cov 10
it reports:
emapper.py: error: unrecognized arguments: --query-cov 10 --subject-cov 10
so it doesn’t look like we have equivalent software. My version was installed from bioconda:
(eggnog) login1:GO_enrichment $ which emapper.py /cluster/gjb_lab/jabbott/miniconda3/envs/eggnog/bin/emapper.py (eggnog) login1:GO_enrichment $ conda list|grep eggnog
eggnog-mapper 1.0.3 py_3 bioconda
I’ll try doing a ‘native’ installation and see how that compares….
If I can replicate the setup which produces the output of the standalone version I think I should have something I can work with.
Many thanks James
From: Carlos P Cantalapiedra <notifications@github.c
om> Reply to: eggnogdb/eggnog-mapper reply@reply.github.com Date: Monday, 11 May 2020 at 17:08 To: eggnogdb/eggnog-mapper eggnog-mapper@noreply.github.com Cc: "James Abbott (Staff)" j.abbott@dundee.ac.uk, Author author@noreply.github.com Subject: Re: [eggnogdb/eggnog-mapper] Confusing GO annotations (#200)
Not sure if there is some confusion here regarding database versions. Version being used in the online version (which uses eggnog-mapper v2.0) is eggnog v5.0
El lun., 11 may. 2020 a las 10:09, James Abbott (notifications@github.com) escribió:
As far as I can see the earliest version of the database available is 4.5.0 - would you be able to make available an earlier database which hasn't pulled in these associated terms? The version 2.0 used by the online version doesn't have the problem annotation for this particular sequence, so may be a good starting point, and would make the output equivalent to having used the online version. Whether this would solve the wider problem or just this particular annotation I guess we have no way of knowing without running and comparing the full dataset. My dataset is pretty big, so would require >200 submissions of 100k sequences to actually run online, which is not really practical.
Thanks, James
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/eggnogdb/eggnog-mapper/issues/200#issuecomment-626544360, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEIMQ3UJERPC25ORFT465YDRQ6XEDANCNFSM4M2YG2OA .
-- Carlos P. Cantalapiedra Post-doctoral researcher Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA) Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA) Campus de Montegancedo-UPM 28223-Pozuelo de Alarcón (Madrid) Spain
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/eggnogdb/eggnog-mapper/issues/200#issuecomment-626798338, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABIRXSJAR5E75RNDIEAANNLRRAPGHANCNFSM4M2YG2OA.
The University of Dundee is a registered Scottish Charity, No: SC015096
OK...so installing directly from github seems to give me a version which produces the correct output:
`# emapper version: emapper-2.0.1 emapper DB: 2.0
2006_NODE_1409982_length_265_cov_0.904762_1 381666.H16_B2360 3.7e-12 77.4 Burkholderiaceae fliC GO:0005575,GO:0005576,GO:0005623,GO:0009288,GO:0042995,GO:0043226,GO:0043228,GO:0044464 ko:K02406 ko02020,ko02040,ko04621,ko04626,ko05132,ko05134,map02020,map02040,map04621,map04626,map05132,map05134 ko00000,ko00001,ko02035 Bacteria 1K01X@119060,1MV1N@1224,2VJTA@28216,COG1344@1,COG1344@2 NA|NA|NA N Flagellin is the subunit protein which polymerizes to form the filaments of bacterial flagella`
The DB is showing as 2.0, and it has a much shorter list of GO terms reported, so the bioconda installed version looks to be the problem...for some reason it seems to have given me 1.0.3 even though I installed into a new, clean environment so it shouldn't have had any dependancy issues leading to it using an old version. I'll do some more playing to try to work out what has caused this, but I am able to produce the correct output now.
Many thanks James
I've been using eggnog_mapper to carry out functional annotation of predicted metagenomic proteins, which are primarily prokaryotic in origin, however a number of the terms which my analysis identifies as being enriched in particular conditions are clearly from higher eukaryotes.
As an example, a protein which is identified as fliC (Flagellin) seems to have the GO terms appropriate from the eggnog entry http://eggnog5.embl.de/#/app/results#COG1344_datamenu which all make sense. It is also annotated as GO:0035681 (toll-like receptor 15 signaling pathway) which is decidedly eukaryotic, and not referenced in the above eggnog entry. As far as I can see, this term is being introduced since there is evidence of an interaction between bacterial flagella and the TOLL signalling pathway, but this term does not apply directly to the annotated protein. As a result of this kind of annotation, I am finding 'enrichment' of various eukaryotic receptor pathways (TOLL, WNT, ERBB) in my metagenomic samples.
Is there any way to restrict the terms reported to those applied directly to the protein function (i.e. in this case, those from COG1344), and not include those which seem to be inferred due to interaction?
Many thanks, James