eggnogdb / eggnog-mapper

Fast genome-wide functional annotation through orthology assignment
http://eggnog-mapper.embl.de
GNU Affero General Public License v3.0
567 stars 106 forks source link

querry with multiple hits interpretation #350

Open sapuizait opened 3 years ago

sapuizait commented 3 years ago

Dear all

This is not an issue, its more of a question about the results i get when I run the eggnog emapper. At the output table, when i look at the KEGGs, sometimes there are more than one KEGG IDs in the same row (1 row = the result of 1 querry). For example, even though most of the results/look like this:

27_641729_18        1410674.JNKU01000016_gene554    4e-34   150.2     COG1476@1|root,COG1476@2|Bacteria,1VEGF@1239|Firmicutes,4HNVM@91061|Bacilli,3F8PR@33958|Lactobacillaceae     3F8PR@33958|Lactobacillaceae    K       Cro/C1-type HTH DNA-binding domain      4HNVM@91061|Bacilli     K       Transcriptional -       -       -       ko:K07729       -       -       -   -ko00000,ko03000 -       -       -       HTH_11,HTH_19,HTH_24,HTH_25,HTH_26,HTH_3,HTH_31

meaning there is only one KEGG id assigned to the querry, the K07729

sometimes I get results like this

27_21786_6 1517682.HW49_11325      3.3e-68 265.0   COG0652@1|root,COG0652@2|Bacteria,4NMKP@976|Bacteroidetes,2G31W@200643|Bacteroidia,22XR7@171551|Porphyromonadaceae   22XR7@171551|Porphyromonadaceae M       PPIases accelerate the folding of proteins. It catalyzes the cis-trans isomerization of proline imidic peptide bonds in oligopeptides   4NMKP@976|Bacteroidetes      M       PPIases accelerate the folding of proteins. It catalyzes the cis-trans isomerization of proline imidic peptide bonds in oligopeptides   ppiA    -       5.2.1.8 ko:K01802,ko:K03768     -   --       -       ko00000,ko01000,ko03110 -       -       -       Pro_isomerase

which have two KOs assigned K01802,K03768

and even though usually these KEGGs are orthologs or have very similar function they kind of flork up my pathway reconstruction because either KEGGmapper does not recognize the comma separated KOs or if I split them I feel like I am pushing it because I assume the similarities werent good enough to give me a single ID in the first place...

Thus I am leaning towards the "get rid of them" solution but what do you think? How do you usually deal with these type of situations?

Cantalapiedra commented 3 years ago

Hi @sapuizait ,

I hope someone more expert on this could comment also. I will just give my 2 cents. I guess that you get 2 KOs because the relationship between eggNOG OGs and KEGG OGs is not one-2-one, unfortunately.

Depending on your data, analysis, goals you could either discard those, use only one of them or use both. Personally, I would introduce both in KEGGmapper, although I have very little experience with this.

Good luck!

Best, Carlos

sapuizait commented 3 years ago

Hey Carlos

Thanks a lot. I have ended up using a version where i introduce everything and one that i eliminate anything that has more than 1 (pretty rare). At the end of the day it makes no huge difference as pathways that are not complete continue to be incomplete...

Thanks AGAIN P

Cantalapiedra commented 3 years ago

I think it was a good idea.

Thanks to you! Feel free to leave the issue open just in case someone else could give advice.

Best, Carlos