YuLab-SMU / DOSE

:mask: Disease Ontology Semantic and Enrichment analysis
https://yulab-smu.top/biomedical-knowledge-mining-book/
117 stars 36 forks source link

visualizing GSEA results with 'dotplot' #20

Closed guidohooiveld closed 2 years ago

guidohooiveld commented 7 years ago

Hi Guangchuang, Maybe a somewhat naive question (feature request?), but is it somehow possible to visualize the results of a GSE run (of a single but also multiple runs) in a compareCluster-dotplot-like figure? I am asking because it would be really cool and helpful if one could represent the top up- and down-regulated genesets, ideally from multiple runs, in a single graph. For example, like you implemented for the compareCluster() function. I was triggered by this idea after reading your online vignette, specifically section 13.2 here, and knowing of the function merge_result() here.

I have something in mind like this picture (from the link to section 13.2 above): cc_updown ... which, 'translated' to GSE results, should rather show the up- and down-regulated gene sets form GSE analysis A and B (sorted either by significance or NES), and color coding representing significance, and node size equaling Gene Set size. (??)

Thanks for considering, Guido

As expected, the function dotplot() doesn't work with the output of a gene set enrichment analysis performed with DOSE.

> library(DOSE)
> data(geneList)
> x <- gseDO(geneList)
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...

> dotplot(x)
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘dotplot’ for signature ‘"gseaResult"’
crazyhottommy commented 7 years ago

Hi, I had the same question. you can just plot it using ggplot2.

library(ggplot2)
library(dplyr)
library(stringr)

## count the gene number
gene_count<- x@result %>% group_by(ID) %>% summarise(count = sum(str_count(core_enrichment, "/")) + 1)

## merge with the original dataframe
dot_df<- left_join(x@result, gene_count, by = "ID") %>% mutate(GeneRatio = count/setSize)

## plot
library(forcats) ## for reordering the factor
ggplot(dot_df, aes(x = GeneRatio, y = fct_reorder(Description, GeneRatio))) + 
               geom_point(aes(size = GeneRatio, color = p.adjust)) +
               theme_bw(base_size = 14) +
        scale_colour_gradient(limits=c(0, 0.10), low="red") +
        ylab(NULL) +
        ggtitle("GO pathway enrichment")

Hope it helps!

Tommy

GuangchuangYu commented 7 years ago

thanks @crazyhottommy.

dot_df = dot_df[1:50,] ## small dataset
dot_df$type = "upregulated"
dot_df$type[dot_df$NES < 0] = "downregulated"

## from Tommy's code
p <- ggplot(dot_df, aes(x = GeneRatio, y = fct_reorder(Description, GeneRatio))) + 
               geom_point(aes(size = GeneRatio, color = p.adjust)) +
               theme_bw(base_size = 14) +
        scale_colour_gradient(limits=c(0, 0.10), low="red") +
        ylab(NULL) +
        ggtitle("GO pathway enrichment")

p + facet_grid(.~type)

screen shot 2016-12-21 at 5 26 03 pm

GuangchuangYu commented 7 years ago

I will add a dotplot method for GSEA result.

Any idea to improve?

guidohooiveld commented 7 years ago

Thanks to both of you! @crazyhottommy and @GuangchuangYu.

I played a bit with the code above to visualize the results of 2 GSE analysis. Since I am not really a programmer, the code is somewhat clumsy...

Although the code works, I would appreciate some refinement:

dose_merged_gsea_up_down

Code:
library(DOSE)
library(ggplot2)
library(dplyr)
library(stringr)
library(forcats)

data(geneList)
x.1 <- gseDO(geneList)
x.2 <- gseDO(geneList, by="DOSE", nPerm=1000) 

## count the gene number for both results
gene_count.x1 <- x.1@result %>% group_by(ID) %>% summarise(count = sum(str_count(core_enrichment, "/")) + 1)
gene_count.x2 <- x.2@result %>% group_by(ID) %>% summarise(count = sum(str_count(core_enrichment, "/")) + 1)

## merge with the original dataframes
dot_df.x1<- left_join(x.1@result, gene_count.x1, by = "ID") %>% mutate(GeneRatio = count/setSize)
dot_df.x2<- left_join(x.2@result, gene_count.x2, by = "ID") %>% mutate(GeneRatio = count/setSize)

## merge the two results
library(clusterProfiler)
merged.res <- as.data.frame(merge_result(list(fgsea=dot_df.x1, dose=dot_df.x2)))

## merged.res <- rbind(dot_df.x1, dot_df.x2) #This merging works but it does **not** include source of results (i.e. 'fgsea' or 'dose')

## Set up/downregulation
merged.res$type = "upregulated"
merged.res$type[merged.res$NES < 0] = "downregulated"

p <- ggplot(merged.res, aes(x = GeneRatio, y = fct_reorder(Description, GeneRatio))) + 
               geom_point(aes(size = GeneRatio, color = p.adjust)) +
               theme_bw(base_size = 14) +
        scale_colour_gradient(limits=c(0, 0.10), low="red") +
        ylab(NULL) +
        ggtitle("Disease Ontology enrichment")

p + facet_grid(.~Cluster+type) #Cluster and type are columns to 'split' on

ggsave("merged_GSE.png")
GuangchuangYu commented 7 years ago

For first issue, of course the showCategory parameter will work for it. For second issue, it should goes to clusterProfiler.

guidohooiveld commented 7 years ago

Thanks for your continuous feedback! However, I got lost by your 2 comments... Why: 1st issue: showCategory parameter is indeed utilized with the various plotting functions, including dotplot. However, dotplot doesn't accept gseaResult object yet.... I assume you mean it will work after you updated thedotplot function?? (https://github.com/GuangchuangYu/DOSE/issues/20#issuecomment-268476805).

2nd issue: using formula interface (as described here). Isn't it correct that you can only use the formula/grouping interface for over-representation (enrichment) analyses? i.e.: which pathways/ontologies are enriched in 'group' or 'othergroup'. But how to apply formula interface for GSE analysis, that uses a full, ranked dataset? In other words, how to use formula interface if you have e.g. 2 ranked lists of genes? Sorry if I am missing something obvious....

FYI: for now I got 1st issue 'working' by manual ordering and selecting on the merged dataframes (dot_df.x1 and dot_df.x2) before merging.

GuangchuangYu commented 7 years ago

issue 1 solved.

2016-12-22-233207_1280x800_scrot

Now dotplot supports gseaResult and showCategory and other parameters we familiar with dotplot method for enrichResult are all work also for gseaResult.

You can also pass the split parameter which will apply the showCateogry by spliting the results using specific parameter. Here .sign is reserved for the sign of NES (activated for >0 and suppressed for <0). So in this example, we plot 30 activated and 30 suppressed enriched terms.

For issue 2, I will elaborate more details when I have time to work it out.

crazyhottommy commented 7 years ago

nice work!

guidohooiveld commented 7 years ago

Indeed, very nice! Thanks Guangchuang!

alnf commented 7 years ago

I've been also trying to merge the results of two different GSE runs. I did smth very similar to what @guidohooiveld did, but I just copy pasted and slightly changed DOSE fortify function. Then I realized I can change order and by arguments, which are normally hidden, since dotplot calls fortify internally. I think accessing these arguments might be useful for both issues, i.e. dotplot and clusterProfiler.

Here is what I got using my own data: order=TRUE, by="Count" LCHP_gseGO_BP_count.pdf order=TRUE, by="GeneRatio" LCHP_gseGO_BP_gr.pdf order=FALSE LCHP_gseGO_BP.pdf

Update. If I correctly understood the source code by argument is implemented in clusterProfiler version of dotplot, but not in DOSE dotplot. Was there a specific reason for that?

saisaitian commented 6 years ago

Hi Guangchuang, when I run KEGG , I want to show the centplot ,but it just show Entrz ID, how can I show symbl? image

GuangchuangYu commented 6 years ago

@wodetianxia1 You can use setReadable if the organism has a corresponding OrgDb.

saisaitian commented 6 years ago

@GuangchuangYu Here is my code: kk3 <- gseKEGG(geneList = geneList, organism = 'hsa', nPerm = 10000, pvalueCutoff = 0.05, verbose = FALSE)

kk4 <- setReadable(kk3, OrgDb = org.Hs.eg.db,keytype = "auto") it just show below,how can I fix it?

image

saisaitian commented 6 years ago

@GuangchuangYu Hi Guangchuang, the specise is human, when I use these code ,it shows that below,so how to let it work? kk <- enrichKEGG(gene = names(geneList), organism = 'hsa', pvalueCutoff = 0.05)

kk444 <- setReadable(kk, OrgDb = org.Hs.eg.db,keytype = "auto")

image

GuangchuangYu commented 6 years ago

@wodetianxia1 keytype = "ENTREZID" should works.

saisaitian commented 6 years ago

Hi Guangchuang, Thank you ! It really works! image Here ,there is another question image you can see the gene names are too many ,how can I just choose top30 to show?

saisaitian commented 6 years ago

Dear Guangchuang.

I'm sorry but I have to ask you another question ,how can I filter GO level for gseGO output,just like enrichGO output,the code are below.

ego3 <- gseGO(geneList = geneList, OrgDb = org.Hs.eg.db, ont = "BP", nPerm = 10000, minGSSize = 100, maxGSSize = 500, pvalueCutoff = 0.01, verbose = FALSE) ego4_filter<-gofilter(ego3, level=4)

image

sghoshuc commented 6 years ago

Dear Guangchuang,

I was trying to use cnetplot and experienced an error

up <- kk2CIMPsub$Description[order(kk2CIMPsub$NES, decreasing=TRUE)][1:3]

head(up) [1] "Ribosome" "Parkinson's disease" "AMPK signaling pathway" cnetplot(kk2CIMPsub,showCategory = up) Warning message: In if (nrow(x) < n) { : the condition has length > 1 and only the first element will be used

does showCategory accept only numbers? However you have used the following commands in this link https://guangchuangyu.github.io/2016/07/leading-edge-analysis/

GuangchuangYu commented 6 years ago

@sghoshuc this feature will be available with enrichplot v >= 1.0.1.

sghoshuc commented 6 years ago

where do I get it? Can you please share the link?

thanks

On Thu, May 24, 2018 at 9:42 PM, Guangchuang Yu notifications@github.com wrote:

@sghoshuc https://github.com/sghoshuc this feature will be available with enrichplot v >= 1.0.1.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GuangchuangYu/DOSE/issues/20#issuecomment-391939391, or mute the thread https://github.com/notifications/unsubscribe-auth/Alw7nKwLsKZP8VXiPsoaBLUbbW2u3717ks5t14u6gaJpZM4LRqyV .

--

Assistant Project Scientist

Pediatrics-Neonatology

Sherin Devaskar Lab

University of California Los Angeles

sghoshuc commented 6 years ago

I mean when it will release?

On Thu, May 24, 2018 at 9:54 PM, SHUBHAMOY GHOSH sghoshucla@ucla.edu wrote:

where do I get it? Can you please share the link?

thanks

On Thu, May 24, 2018 at 9:42 PM, Guangchuang Yu notifications@github.com wrote:

@sghoshuc https://github.com/sghoshuc this feature will be available with enrichplot v >= 1.0.1.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GuangchuangYu/DOSE/issues/20#issuecomment-391939391, or mute the thread https://github.com/notifications/unsubscribe-auth/Alw7nKwLsKZP8VXiPsoaBLUbbW2u3717ks5t14u6gaJpZM4LRqyV .

--

Assistant Project Scientist

Pediatrics-Neonatology

Sherin Devaskar Lab

University of California Los Angeles

--

Assistant Project Scientist

Pediatrics-Neonatology

Sherin Devaskar Lab

University of California Los Angeles

sghoshuc commented 6 years ago

How should I plot upregulated genes using current version ?

thanks

On Thu, May 24, 2018 at 10:05 PM, SHUBHAMOY GHOSH sghoshucla@ucla.edu wrote:

I mean when it will release?

On Thu, May 24, 2018 at 9:54 PM, SHUBHAMOY GHOSH sghoshucla@ucla.edu wrote:

where do I get it? Can you please share the link?

thanks

On Thu, May 24, 2018 at 9:42 PM, Guangchuang Yu <notifications@github.com

wrote:

@sghoshuc https://github.com/sghoshuc this feature will be available with enrichplot v >= 1.0.1.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GuangchuangYu/DOSE/issues/20#issuecomment-391939391, or mute the thread https://github.com/notifications/unsubscribe-auth/Alw7nKwLsKZP8VXiPsoaBLUbbW2u3717ks5t14u6gaJpZM4LRqyV .

--

Assistant Project Scientist

Pediatrics-Neonatology

Sherin Devaskar Lab

University of California Los Angeles

--

Assistant Project Scientist

Pediatrics-Neonatology

Sherin Devaskar Lab

University of California Los Angeles

--

Assistant Project Scientist

Pediatrics-Neonatology

Sherin Devaskar Lab

University of California Los Angeles

sghoshuc commented 6 years ago

sorry, I misunderstood. Yes, I tried with enrichplot 1.1.0 and it seems working. however, it's showing en warning Warning message: In if (nrow(x) < n) { : the condition has length > 1 and only the first element will be used Also, the plot takes forever to show up (maybe the numbers of genes are too many) and even if I select two of the 5 upregulated pathways it seems to plot all 5. is there any bug or am I doing something incorrectly?

thanks

On Thu, May 24, 2018 at 11:49 PM, SHUBHAMOY GHOSH sghoshucla@ucla.edu wrote:

How should I plot upregulated genes using current version ?

thanks

On Thu, May 24, 2018 at 10:05 PM, SHUBHAMOY GHOSH sghoshucla@ucla.edu wrote:

I mean when it will release?

On Thu, May 24, 2018 at 9:54 PM, SHUBHAMOY GHOSH sghoshucla@ucla.edu wrote:

where do I get it? Can you please share the link?

thanks

On Thu, May 24, 2018 at 9:42 PM, Guangchuang Yu < notifications@github.com> wrote:

@sghoshuc https://github.com/sghoshuc this feature will be available with enrichplot v >= 1.0.1.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GuangchuangYu/DOSE/issues/20#issuecomment-391939391, or mute the thread https://github.com/notifications/unsubscribe-auth/Alw7nKwLsKZP8VXiPsoaBLUbbW2u3717ks5t14u6gaJpZM4LRqyV .

--

Assistant Project Scientist

Pediatrics-Neonatology

Sherin Devaskar Lab

University of California Los Angeles

--

Assistant Project Scientist

Pediatrics-Neonatology

Sherin Devaskar Lab

University of California Los Angeles

--

Assistant Project Scientist

Pediatrics-Neonatology

Sherin Devaskar Lab

University of California Los Angeles

--

Assistant Project Scientist

Pediatrics-Neonatology

Sherin Devaskar Lab

University of California Los Angeles

GuangchuangYu commented 6 years ago

@sghoshuc you should try 1.1.1 if you are using devel branch.

BioLeon0209 commented 6 years ago

Hi Guangchuang,

I have a problem with dotplot, as shown by the figure below

image

Currently, I know that I can modify the font.size parameter. Do you have any better methods to solve the problem?

Thanks very much for your help.

Best Regards, Leon.

GuangchuangYu commented 6 years ago

google search 'wrap text in ggplot', will give you solution.

sghoshuc commented 5 years ago

Hi Guangchuang, I am trying to use cnetplot and it's not showing any edges. It worked before.

GuangchuangYu commented 5 years ago

@sghoshuc if this is reproducible with latest version, post reproducible example in a new issue.

mohammedkhalfan commented 5 years ago

We can split the GSEA result into enriched and suppressed pathways using the example provided above, that's great, but how can we do this for over-representation analysis? Thank you.

JoshuaCumming commented 4 years ago

Hej,

Is there a way I can use dotplot to select specific pathways from the results list? showCategory seems to only show the pathways from 1-x where showCategory = x. If you try showCategory = 1:10 it will only show 1 pathway.

Is there a way to pick out a mix of pathways to select for display in the dotplot?

Thanks for any help!

Joshua

adspit commented 4 years ago

I will add a dotplot method for GSEA result.

Any idea to improve?

Hi Guangchuang, I find plotting the gene ratio both on the x-axis and as dot size redundant. I think if you just keep it as a dot size, you could offer the users the possibility of plotting multiple groups on the x-axis and eliminate the facets in ggplot.

isaacnathoo commented 4 years ago

@BioLeon0209 Hi Leon, I am running into the same problem with dotplot as you showed above. How did you fix this issue? I could not figure out how to wrap the text. Thank you so much!

BioLeon0209 commented 4 years ago

Hi Nathoo,

I solved the problem by defining a new function using ggplot2 and stringr to fulfill the dotplot.

The function is shown below:

" library(ggplot2) library(stringr)

define a new dotplot function

x is the variable of the enrichment results

width is used to set the width of the labels of y-axis

top is used to set how many toppest terms will be shown in the figure

dotplot_ylab <- function(x,width,top){   x.df <- as.data.frame(x)   if(nrow(x.df)>top){     x.df <- x.df[1:top,]   }   x.df$GeneRatio <-  unlist(lapply(as.list(x.df$GeneRatio), function(x) eval(parse(text=x))))    x.df$Description <- factor(x.df$Description,levels = x.df$Description[order(x.df$Count,decreasing = F)])   #x.df <- x.df[order(x.df$Count,decreasing = F),]   p<-ggplot(x.df, aes(x=GeneRatio, y=Description,color=p.adjust)) +      geom_point(aes(size = Count))+scale_color_gradient(low="red", high="blue")+scale_y_discrete(labels=function(x) str_wrap(x,width=width)) }

"   You could check whether it is ok for you.

Best Wishes, Leon.

------------------ Original ------------------ From:  "Isaac Nathoo";<notifications@github.com>; Send time: Friday, Jun 26, 2020 1:01 PM To: "YuLab-SMU/DOSE"<DOSE@noreply.github.com>; Cc: "Changliang"<wangchangliang0209@foxmail.com>; "Mention"<mention@noreply.github.com>; Subject:  Re: [YuLab-SMU/DOSE] visualizing GSEA results with 'dotplot' (#20)

@BioLeon0209 Hi Leon, I am running into the same problem with dotplot as you showed above. How did you fix this issue? I could not figure out how to wrap the text. Thank you so much!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

NicNikoloutsos commented 2 years ago

Could you make it so that if a gene is up or down regulated could be identified by the shape of their point in dotplot? Example: 2 shapes, circle for upregulated, triangle for downreglated

GuangchuangYu commented 2 years ago

open a new issue if your question is unsolved.