YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
976 stars 250 forks source link

Kegg annotation #99

Closed Sisov closed 6 years ago

Sisov commented 6 years ago

Dear author: Since the clusterProfiler is a very useful tools for GO and Kegg annotation.At present I want to use it to enrich for kegg result while only have the KO number ,So I want to convert the KO number to the pathway function,Is there have any function or methods in the software can convert it?any help will be appreciated

Thanks

GuangchuangYu commented 6 years ago

ko is actually pathway map. I think you are talking about K number mapping to ko pathway.

> bitr_kegg("K00844", "kegg", "Path", "ko")
     kegg    Path
1  K00844 ko00010
2  K00844 ko00051
3  K00844 ko00052
4  K00844 ko00500
5  K00844 ko00520
6  K00844 ko00521
7  K00844 ko00524
8  K00844 ko01100
9  K00844 ko01110
10 K00844 ko01120
11 K00844 ko01130
12 K00844 ko01200
13 K00844 ko04066
14 K00844 ko04910
15 K00844 ko04930
16 K00844 ko04973
17 K00844 ko05230
Sisov commented 6 years ago

Yeah,sorry,It's really the K number,Since I want to obtain the pathway according the K number,such like this ,did have any methods to achieve it ? 1 K00799 Drug metabolism - cytochrome P450

Thanks

GuangchuangYu commented 6 years ago

just write a ko2name function for this purpose.

> bitr_kegg("K00799", "kegg", "Path", "ko") -> x
> ko2name(x$Path) -> y
> merge(x, y, by.x='Path', by.y='ko')
     Path   kegg                                         name
1 ko00480 K00799                       Glutathione metabolism
2 ko00980 K00799 Metabolism of xenobiotics by cytochrome P450
3 ko00982 K00799            Drug metabolism - cytochrome P450
4 ko01524 K00799                     Platinum drug resistance
5 ko05204 K00799                      Chemical carcinogenesis
6 ko05418 K00799       Fluid shear stress and atherosclerosis
Sisov commented 6 years ago

well,unfortunately,It appears some error while run the x<-bitr_kegg("K00799", "kegg", "Path", "ko"), Error in match.arg(toType, id_types) : 'arg' should be one of “ncbi-proteinid”, “ncbi-geneid”, “uniprot”, “kegg”

GuangchuangYu commented 6 years ago

see the Prerequisites session, https://github.com/GuangchuangYu/clusterProfiler/issues/new.

GuangchuangYu commented 6 years ago

BTW: you can use enrichKEGG with K number by specifying organism="ko".

Sisov commented 6 years ago

Thanks,it's works well ,the software is so good !!!

liuxianghui commented 6 years ago

Dear GuangChuang: Thank you very much for using the ko for analysis of organisms not existed in KEGG organisms. This is cool! I like it very much! I works on bacteria and some are not in KEGG organisms and barely have any annotations... no GO and no KEGG... Anyway I can work it for KEGG pathway enrichment analysis. Biologists like it. The only one limitation is when I try to plot the KEGG pathway with pathview. I am unable to put the correct fold change data on the map. I guess it is because we use K number. Multiple genes will have the same K number... Do you kindly have a solution for that?

MichaelFokinNZ commented 5 years ago

BIG-BIG-GREAT THANK YOU!!!!

ShenTTT commented 4 years ago

BTW: you can use enrichKEGG with K number by specifying organism="ko".

Hi @GuangchuangYu I am working with a non-model organism. First I used KAAS to annotate the genome with K numbers, then I got a list of genes vs K numbers. In order to do KEGG pathway analysis, I need to translate K values to ko numbers. In this case, should I use enrichKEGG or enricher?

If I use enricher, I need to translate all K numbers to pathways first, and eventually get a list of pathways2genes as the TERM2GENE, right?

If I use enrichKEGG, according to your reply I can set organism='ko'. How this can be achieved?There is no way for me to input the gene vs K number list right?

I appreciate it if you can clarify this.

Thank you so much

Stepmata commented 4 years ago

BTW: you can use enrichKEGG with K number by specifying organism="ko".

Hi @GuangchuangYu I am working with a non-model organism. First I used KAAS to annotate the genome with K numbers, then I got a list of genes vs K numbers. In order to do KEGG pathway analysis, I need to translate K values to ko numbers. In this case, should I use enrichKEGG or enricher?

If I use enricher, I need to translate all K numbers to pathways first, and eventually get a list of pathways2genes as the TERM2GENE, right?

If I use enrichKEGG, according to your reply I can set organism='ko'. How this can be achieved?There is no way for me to input the gene vs K number list right?

I appreciate it if you can clarify this.

Thank you so much

Hi! You solved your problem? I'm doing a kegg enrichment analysis, also with a non-model organism. I used the enrichKEGG( ) function but a get this error message:

ca_kegg <- enrichKEGG(ca_list, organism = 'ko', keyType = 'kegg', universe = BBRB_KEGG, pAdjustMethod = "BH") --> No gene can be mapped.... --> Expected input gene ID: K00895,K01810,K21622,K16370,K15779,K01218 --> return NULL...

In this case ca_list is my list of DE gene ID's and BBRB_KEGG is a dataframe of two columns with gene ID's and KEGG annotations that I get with Trinotate.

How could I solve this problem and what means that "gene can be mapped"? Thank you!

ShenTTT commented 4 years ago

BTW: you can use enrichKEGG with K number by specifying organism="ko".

Hi @GuangchuangYu I am working with a non-model organism. First I used KAAS to annotate the genome with K numbers, then I got a list of genes vs K numbers. In order to do KEGG pathway analysis, I need to translate K values to ko numbers. In this case, should I use enrichKEGG or enricher? If I use enricher, I need to translate all K numbers to pathways first, and eventually get a list of pathways2genes as the TERM2GENE, right? If I use enrichKEGG, according to your reply I can set organism='ko'. How this can be achieved?There is no way for me to input the gene vs K number list right? I appreciate it if you can clarify this. Thank you so much

Hi! You solved your problem? I'm doing a kegg enrichment analysis, also with a non-model organism. I used the enrichKEGG( ) function but a get this error message:

ca_kegg <- enrichKEGG(ca_list, organism = 'ko', keyType = 'kegg', universe = BBRB_KEGG, pAdjustMethod = "BH") --> No gene can be mapped.... --> Expected input gene ID: K00895,K01810,K21622,K16370,K15779,K01218 --> return NULL...

In this case ca_list is my list of DE gene ID's and BBRB_KEGG is a dataframe of two columns with gene ID's and KEGG annotations that I get with Trinotate.

How could I solve this problem and what means that "gene can be mapped"? Thank you!

Hi, I guess you used K number instead of ko number. I am not familiar with Trinotate but can you check the output from Trinotate? There should be another column with ko number (koxxxxx). Use that number instead of Kxxxxx

Stepmata commented 4 years ago

Actually I'm using Ko number (Ko:xxxx) but I removed the prefix "KO:" of the KEGG terms, that's why it looks like that.

ShenTTT commented 4 years ago

Actually I'm using Ko number (Ko:xxxx) but I removed the prefix "KO:" of the KEGG terms, that's why it looks like that.

Actually the enrichKEGG with organism='ko' never worked in my case. So I switched to the enricher function (Set everything manually).

enricher(gene_list,TERM2GENE=background,TERM2NAME=kegg2name_data,pvalueCutoff = 1,qvalueCutoff = 1, pAdjustMethod = "BH")

Gene_list is my genes of interest. background is equivalent to your BBRB_KEGG but with ko numbers as the first column, kegg2name is a dataframe with 2 columns mapping ko numbers to the corresponding descriptions (This can be skipped if you want to get the enriched ko number rather than the textual descriptions).

Stepmata commented 4 years ago

Actually the enrichKEGG with organism='ko' never worked in my case. So I switched to the enricher function (Set everything manually).

enricher(gene_list,TERM2GENE=background,TERM2NAME=kegg2name_data,pvalueCutoff = 1,qvalueCutoff = 1, pAdjustMethod = "BH")

Gene_list is my genes of interest. background is equivalent to your BBRB_KEGG but with ko numbers as the first column, kegg2name is a dataframe with 2 columns mapping ko numbers to the corresponding descriptions (This can be skipped if you want to get the enriched ko number rather than the textual descriptions).

Ohh I can see. There's a way to get the description for every Ko number?

ShenTTT commented 4 years ago

Actually the enrichKEGG with organism='ko' never worked in my case. So I switched to the enricher function (Set everything manually). enricher(gene_list,TERM2GENE=background,TERM2NAME=kegg2name_data,pvalueCutoff = 1,qvalueCutoff = 1, pAdjustMethod = "BH") Gene_list is my genes of interest. background is equivalent to your BBRB_KEGG but with ko numbers as the first column, kegg2name is a dataframe with 2 columns mapping ko numbers to the corresponding descriptions (This can be skipped if you want to get the enriched ko number rather than the textual descriptions).

Ohh I can see. There's a way to get the description for every Ko number?

Just a reminder, I feel that you are still using the K numbers instead of the KEGG pathways. KEGG KO (ko:Kxxxxx) is just the enzyme in the pathway. Normally you get one such KO per gene. Here we actually want to use the pathway id (koxxxxx (without ':') or mapxxxxx, the 'xxxxx' in ko and map are the same. One KEGG KO can be mapped to zero or multiple pathways. So you are supposed to get zero or multiple koxxxxx or mapxxxxx per gene. I used eggnog for annotation so I get both KO and pathway columns, do check your annotation to see if you get such pathway ids (koxxxxx, or mapxxxxx), this is what you want.

I am not sure if your 'Ko:xxxx' is KEGG KO or pathway. If you got multiple terms per gene then you can directly use that since I assume thats already pathway ids. If you only got one such term per gene, more possibly it's just the K number.

Once u get the pathway id, install KEGG.db package, you can get a list of all pathway numbers to names using KEGGPATHID2NAME. The pathway numbers are the xxxxxx in your pathway ids (koxxxxxx or mapxxxxx), NOT KEGG KO ids (ko:Kxxxxx).

If you only got the K numbers (Kxxxxx) map it to the pathways using the method described previously in this post by @GuangchuangYu

Hope this is helpful. It did take me a long time to figure all these out...

For more info on how KO and pathways are in different formats, check: https://www.genome.jp/kegg/ko.html https://www.genome.jp/kegg/pathway.html

Stepmata commented 4 years ago

I can imagine it, this is a bit confusing. I'll check that, thank you very much for all the info.

Stepmata commented 4 years ago

Just a reminder, I feel that you are still using the K numbers instead of the KEGG pathways. KEGG KO (ko:Kxxxxx) is just the enzyme in the pathway. Normally you get one such KO per gene. Here we actually want to use the pathway id (koxxxxx (without ':') or mapxxxxx, the 'xxxxx' in ko and map are the same. One KEGG KO can be mapped to zero or multiple pathways. So you are supposed to get zero or multiple koxxxxx or mapxxxxx per gene. I used eggnog for annotation so I get both KO and pathway columns, do check your annotation to see if you get such pathway ids (koxxxxx, or mapxxxxx), this is what you want.

I am not sure if your 'Ko:xxxx' is KEGG KO or pathway. If you got multiple terms per gene then you can directly use that since I assume thats already pathway ids. If you only got one such term per gene, more possibly it's just the K number.

Once u get the pathway id, install KEGG.db package, you can get a list of all pathway numbers to names using KEGGPATHID2NAME. The pathway numbers are the xxxxxx in your pathway ids (koxxxxxx or mapxxxxx), NOT KEGG KO ids (ko:Kxxxxx).

If you only got the K numbers (Kxxxxx) map it to the pathways using the method described previously in this post by @GuangchuangYu

Hope this is helpful. It did take me a long time to figure all these out...

For more info on how KO and pathways are in different formats, check: https://www.genome.jp/kegg/ko.html https://www.genome.jp/kegg/pathway.html

Thank you so much for taking the time to give me all this information, was very helpful. My analysis is already done! I Will share the information in case that other person have the same problem! n_n

ShenTTT commented 4 years ago

@Stepmata Glad to hear that :)

edlopez78 commented 4 years ago

Actually the enrichKEGG with organism='ko' never worked in my case. So I switched to the enricher function (Set everything manually). enricher(gene_list,TERM2GENE=background,TERM2NAME=kegg2name_data,pvalueCutoff = 1,qvalueCutoff = 1, pAdjustMethod = "BH") Gene_list is my genes of interest. background is equivalent to your BBRB_KEGG but with ko numbers as the first column, kegg2name is a dataframe with 2 columns mapping ko numbers to the corresponding descriptions (This can be skipped if you want to get the enriched ko number rather than the textual descriptions).

Ohh I can see. There's a way to get the description for every Ko number?

Hi. I'm working with a non-model specie and Trinotate. Please, could you share me your solution about the setting KEGG Trinotate output to use with enricher?

Stepmata commented 4 years ago

Hi. I'm working with a non-model specie and Trinotate. Please, could you share me your solution about the setting KEGG Trinotate output to use with enricher?

Hi! To use enricher function with my KEGG annotation I firts get the patways ID (ko number) mapping all my KEGG terms (k number) to KEGG data base using bitr_kegg function. Once a I had the pathways ID I get the pathways name using ko2name function. This two functions are from KEGG.db R package. Now to run enricher a made two dataframes of two columns, one dataframe that I called "term2gene" with ko numbers in first column and annotated genes ID in the second one. The other dataframe that I called "term2name" had ko numbers in first column and pathways name in the second one. Also to apply enricher you have to create a vector with all your differentially expressed genes ID, and that's all, you need all this information to run your KEGG enrichment test! n_n

edlopez78 commented 4 years ago

Hi. I'm working with a non-model specie and Trinotate. Please, could you share me your solution about the setting KEGG Trinotate output to use with enricher?

Hi! To use enricher function with my KEGG annotation I firts get the patways ID (ko number) mapping all my KEGG terms (k number) to KEGG data base using bitr_kegg function. Once a I had the pathways ID I get the pathways name using ko2name function. This two functions are from KEGG.db R package. Now to run enricher a made two dataframes of two columns, one dataframe that I called "term2gene" with ko numbers in first column and annotated genes ID in the second one. The other dataframe that I called "term2name" had ko numbers in first column and pathways name in the second one. Also to apply enricher you have to create a vector with all your differentially expressed genes ID, and that's all, you need all this information to run your KEGG enrichment test! n_n

Hi!!. My analysis is already done. Thank you so much for your help and your time!!. 👍

Stepmata commented 4 years ago

Hi!!. My analysis is already done. Thank you so much for your help and your time!!. +1

That's nice!! Your welcome!! n_n

Esteban-Escobar commented 3 years ago

Hi i did the ORA analysis from my organism data with the k numbers and it worked but, it reported back human diseases pathways and i'm working with Physcomitrella (moss). I wanted to know if there's any possibility that i could get the species-specific IDs for the ORA analysis from the K numbers or other way do obtain them. Thanks.

tobytaogla commented 3 years ago

Hi. I'm working with a non-model specie and Trinotate. Please, could you share me your solution about the setting KEGG Trinotate output to use with enricher?

Hi! To use enricher function with my KEGG annotation I firts get the patways ID (ko number) mapping all my KEGG terms (k number) to KEGG data base using bitr_kegg function. Once a I had the pathways ID I get the pathways name using ko2name function. This two functions are from KEGG.db R package. Now to run enricher a made two dataframes of two columns, one dataframe that I called "term2gene" with ko numbers in first column and annotated genes ID in the second one. The other dataframe that I called "term2name" had ko numbers in first column and pathways name in the second one. Also to apply enricher you have to create a vector with all your differentially expressed genes ID, and that's all, you need all this information to run your KEGG enrichment test! n_n

Hi!!. My analysis is already done. Thank you so much for your help and your time!!. 👍

Hi, I have difficulties to generate KEEG into the Trinotate annotation file? What software did you use to generate the kegg annotation? Can you help for this? Many thanks!

Stepmata commented 3 years ago

Hi. I'm working with a non-model specie and Trinotate. Please, could you share me your solution about the setting KEGG Trinotate output to use with enricher?

Hi! To use enricher function with my KEGG annotation I firts get the patways ID (ko number) mapping all my KEGG terms (k number) to KEGG data base using bitr_kegg function. Once a I had the pathways ID I get the pathways name using ko2name function. This two functions are from KEGG.db R package. Now to run enricher a made two dataframes of two columns, one dataframe that I called "term2gene" with ko numbers in first column and annotated genes ID in the second one. The other dataframe that I called "term2name" had ko numbers in first column and pathways name in the second one. Also to apply enricher you have to create a vector with all your differentially expressed genes ID, and that's all, you need all this information to run your KEGG enrichment test! n_n

Hi!!. My analysis is already done. Thank you so much for your help and your time!!. +1

Hi, I have difficulties to generate KEEG into the Trinotate annotation file? What software did you use to generate the kegg annotation? Can you help for this? Many thanks!

Hi, I used Trinotate to generate all my KEGG annotations. What kind of problem do you have running Trinotate?

tobytaogla commented 3 years ago

Hi. I'm working with a non-model specie and Trinotate. Please, could you share me your solution about the setting KEGG Trinotate output to use with enricher?

Hi! To use enricher function with my KEGG annotation I firts get the patways ID (ko number) mapping all my KEGG terms (k number) to KEGG data base using bitr_kegg function. Once a I had the pathways ID I get the pathways name using ko2name function. This two functions are from KEGG.db R package. Now to run enricher a made two dataframes of two columns, one dataframe that I called "term2gene" with ko numbers in first column and annotated genes ID in the second one. The other dataframe that I called "term2name" had ko numbers in first column and pathways name in the second one. Also to apply enricher you have to create a vector with all your differentially expressed genes ID, and that's all, you need all this information to run your KEGG enrichment test! n_n

Hi!!. My analysis is already done. Thank you so much for your help and your time!!. +1

Hi, I have difficulties to generate KEEG into the Trinotate annotation file? What software did you use to generate the kegg annotation? Can you help for this? Many thanks!

Hi, I used Trinotate to generate all my KEGG annotations. What kind of problem do you have running Trinotate?

Thanks for the quick reply. Trinotate does not have kegg annotation by default. So I assume you generate the kegg file by yourself. So what kind of software you run to have this file. Sorry for the silly question.

Stepmata commented 3 years ago

Well, I got the k number (kegg term) in the Trinotate output file, then I used that information to made the mapping and get the ko number (pathway ID).

El mar., 27 de abr. de 2021 3:29 PM, tobytaogla @.***> escribió:

Hi. I'm working with a non-model specie and Trinotate. Please, could you share me your solution about the setting KEGG Trinotate output to use with enricher?

Hi! To use enricher function with my KEGG annotation I firts get the patways ID (ko number) mapping all my KEGG terms (k number) to KEGG data base using bitr_kegg function. Once a I had the pathways ID I get the pathways name using ko2name function. This two functions are from KEGG.db R package. Now to run enricher a made two dataframes of two columns, one dataframe that I called "term2gene" with ko numbers in first column and annotated genes ID in the second one. The other dataframe that I called "term2name" had ko numbers in first column and pathways name in the second one. Also to apply enricher you have to create a vector with all your differentially expressed genes ID, and that's all, you need all this information to run your KEGG enrichment test! n_n

Hi!!. My analysis is already done. Thank you so much for your help and your time!!. +1

Hi, I have difficulties to generate KEEG into the Trinotate annotation file? What software did you use to generate the kegg annotation? Can you help for this? Many thanks!

Hi, I used Trinotate to generate all my KEGG annotations. What kind of problem do you have running Trinotate?

Thanks for the quick reply. Trinotate does not have kegg annotation by default. So I assume you generate the kegg file by yourself. So what kind of software you run to have this file. Sorry for the silly question.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/YuLab-SMU/clusterProfiler/issues/99#issuecomment-827943415, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO3NFLLMZTC45EWTGTOJVMDTK4UD7ANCNFSM4DWXSTHA .