Closed guidohooiveld closed 2 years ago
Hi, I had the same question. you can just plot it using ggplot2.
library(ggplot2)
library(dplyr)
library(stringr)
## count the gene number
gene_count<- x@result %>% group_by(ID) %>% summarise(count = sum(str_count(core_enrichment, "/")) + 1)
## merge with the original dataframe
dot_df<- left_join(x@result, gene_count, by = "ID") %>% mutate(GeneRatio = count/setSize)
## plot
library(forcats) ## for reordering the factor
ggplot(dot_df, aes(x = GeneRatio, y = fct_reorder(Description, GeneRatio))) +
geom_point(aes(size = GeneRatio, color = p.adjust)) +
theme_bw(base_size = 14) +
scale_colour_gradient(limits=c(0, 0.10), low="red") +
ylab(NULL) +
ggtitle("GO pathway enrichment")
Hope it helps!
Tommy
thanks @crazyhottommy.
dot_df = dot_df[1:50,] ## small dataset
dot_df$type = "upregulated"
dot_df$type[dot_df$NES < 0] = "downregulated"
## from Tommy's code
p <- ggplot(dot_df, aes(x = GeneRatio, y = fct_reorder(Description, GeneRatio))) +
geom_point(aes(size = GeneRatio, color = p.adjust)) +
theme_bw(base_size = 14) +
scale_colour_gradient(limits=c(0, 0.10), low="red") +
ylab(NULL) +
ggtitle("GO pathway enrichment")
p + facet_grid(.~type)
I will add a dotplot
method for GSEA
result.
Any idea to improve?
Thanks to both of you! @crazyhottommy and @GuangchuangYu.
I played a bit with the code above to visualize the results of 2 GSE analysis. Since I am not really a programmer, the code is somewhat clumsy...
Although the code works, I would appreciate some refinement:
gseDO()
, default settings: pvalueCutoff = 0.05
,pAdjustMethod = "BH"
]. This makes the graph unreadable. How to best select the (let's say) top 15 significant gene sets of both GSE results?merge_result
?Code:
library(DOSE)
library(ggplot2)
library(dplyr)
library(stringr)
library(forcats)
data(geneList)
x.1 <- gseDO(geneList)
x.2 <- gseDO(geneList, by="DOSE", nPerm=1000)
## count the gene number for both results
gene_count.x1 <- x.1@result %>% group_by(ID) %>% summarise(count = sum(str_count(core_enrichment, "/")) + 1)
gene_count.x2 <- x.2@result %>% group_by(ID) %>% summarise(count = sum(str_count(core_enrichment, "/")) + 1)
## merge with the original dataframes
dot_df.x1<- left_join(x.1@result, gene_count.x1, by = "ID") %>% mutate(GeneRatio = count/setSize)
dot_df.x2<- left_join(x.2@result, gene_count.x2, by = "ID") %>% mutate(GeneRatio = count/setSize)
## merge the two results
library(clusterProfiler)
merged.res <- as.data.frame(merge_result(list(fgsea=dot_df.x1, dose=dot_df.x2)))
## merged.res <- rbind(dot_df.x1, dot_df.x2) #This merging works but it does **not** include source of results (i.e. 'fgsea' or 'dose')
## Set up/downregulation
merged.res$type = "upregulated"
merged.res$type[merged.res$NES < 0] = "downregulated"
p <- ggplot(merged.res, aes(x = GeneRatio, y = fct_reorder(Description, GeneRatio))) +
geom_point(aes(size = GeneRatio, color = p.adjust)) +
theme_bw(base_size = 14) +
scale_colour_gradient(limits=c(0, 0.10), low="red") +
ylab(NULL) +
ggtitle("Disease Ontology enrichment")
p + facet_grid(.~Cluster+type) #Cluster and type are columns to 'split' on
ggsave("merged_GSE.png")
For first issue, of course the showCategory
parameter will work for it.
For second issue, it should goes to clusterProfiler.
Thanks for your continuous feedback! However, I got lost by your 2 comments...
Why:
1st issue: showCategory
parameter is indeed utilized with the various plotting functions, including dotplot
. However, dotplot
doesn't accept gseaResult
object yet.... I assume you mean it will work after you updated thedotplot
function?? (https://github.com/GuangchuangYu/DOSE/issues/20#issuecomment-268476805).
2nd issue: using formula interface (as described here). Isn't it correct that you can only use the formula/grouping interface for over-representation (enrichment) analyses? i.e.: which pathways/ontologies are enriched in 'group' or 'othergroup'. But how to apply formula interface for GSE analysis, that uses a full, ranked dataset? In other words, how to use formula interface if you have e.g. 2 ranked lists of genes? Sorry if I am missing something obvious....
FYI: for now I got 1st issue 'working' by manual ordering and selecting on the merged dataframes (dot_df.x1 and dot_df.x2) before merging.
issue 1 solved.
Now dotplot
supports gseaResult
and showCategory
and other parameters we familiar with dotplot
method for enrichResult
are all work also for gseaResult
.
You can also pass the split
parameter which will apply the showCateogry
by spliting the results using specific parameter. Here .sign
is reserved for the sign of NES (activated
for >0 and suppressed
for <0). So in this example, we plot 30 activated and 30 suppressed enriched terms.
For issue 2, I will elaborate more details when I have time to work it out.
nice work!
Indeed, very nice! Thanks Guangchuang!
I've been also trying to merge the results of two different GSE runs. I did smth very similar to what @guidohooiveld did, but I just copy pasted and slightly changed DOSE fortify
function. Then I realized I can change order
and by
arguments, which are normally hidden, since dotplot
calls fortify
internally. I think accessing these arguments might be useful for both issues, i.e. dotplot
and clusterProfiler.
Here is what I got using my own data:
order=TRUE, by="Count"
LCHP_gseGO_BP_count.pdf
order=TRUE, by="GeneRatio"
LCHP_gseGO_BP_gr.pdf
order=FALSE
LCHP_gseGO_BP.pdf
Update. If I correctly understood the source code by
argument is implemented in clusterProfiler version of dotplot, but not in DOSE dotplot. Was there a specific reason for that?
Hi Guangchuang, when I run KEGG , I want to show the centplot ,but it just show Entrz ID, how can I show symbl?
@wodetianxia1 You can use setReadable
if the organism has a corresponding OrgDb
.
@GuangchuangYu Here is my code: kk3 <- gseKEGG(geneList = geneList, organism = 'hsa', nPerm = 10000, pvalueCutoff = 0.05, verbose = FALSE)
kk4 <- setReadable(kk3, OrgDb = org.Hs.eg.db,keytype = "auto") it just show below,how can I fix it?
@GuangchuangYu Hi Guangchuang, the specise is human, when I use these code ,it shows that below,so how to let it work? kk <- enrichKEGG(gene = names(geneList), organism = 'hsa', pvalueCutoff = 0.05)
kk444 <- setReadable(kk, OrgDb = org.Hs.eg.db,keytype = "auto")
@wodetianxia1 keytype = "ENTREZID"
should works.
Hi Guangchuang, Thank you ! It really works! Here ,there is another question you can see the gene names are too many ,how can I just choose top30 to show?
Dear Guangchuang.
I'm sorry but I have to ask you another question ,how can I filter GO level for gseGO output,just like enrichGO output,the code are below.
ego3 <- gseGO(geneList = geneList, OrgDb = org.Hs.eg.db, ont = "BP", nPerm = 10000, minGSSize = 100, maxGSSize = 500, pvalueCutoff = 0.01, verbose = FALSE) ego4_filter<-gofilter(ego3, level=4)
Dear Guangchuang,
I was trying to use cnetplot and experienced an error
up <- kk2CIMPsub$Description[order(kk2CIMPsub$NES, decreasing=TRUE)][1:3]
head(up) [1] "Ribosome" "Parkinson's disease" "AMPK signaling pathway" cnetplot(kk2CIMPsub,showCategory = up) Warning message: In if (nrow(x) < n) { : the condition has length > 1 and only the first element will be used
does showCategory accept only numbers? However you have used the following commands in this link https://guangchuangyu.github.io/2016/07/leading-edge-analysis/
@sghoshuc this feature will be available with enrichplot
v >= 1.0.1.
where do I get it? Can you please share the link?
thanks
On Thu, May 24, 2018 at 9:42 PM, Guangchuang Yu notifications@github.com wrote:
@sghoshuc https://github.com/sghoshuc this feature will be available with enrichplot v >= 1.0.1.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GuangchuangYu/DOSE/issues/20#issuecomment-391939391, or mute the thread https://github.com/notifications/unsubscribe-auth/Alw7nKwLsKZP8VXiPsoaBLUbbW2u3717ks5t14u6gaJpZM4LRqyV .
--
Assistant Project Scientist
Pediatrics-Neonatology
Sherin Devaskar Lab
University of California Los Angeles
I mean when it will release?
On Thu, May 24, 2018 at 9:54 PM, SHUBHAMOY GHOSH sghoshucla@ucla.edu wrote:
where do I get it? Can you please share the link?
thanks
On Thu, May 24, 2018 at 9:42 PM, Guangchuang Yu notifications@github.com wrote:
@sghoshuc https://github.com/sghoshuc this feature will be available with enrichplot v >= 1.0.1.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GuangchuangYu/DOSE/issues/20#issuecomment-391939391, or mute the thread https://github.com/notifications/unsubscribe-auth/Alw7nKwLsKZP8VXiPsoaBLUbbW2u3717ks5t14u6gaJpZM4LRqyV .
--
Assistant Project Scientist
Pediatrics-Neonatology
Sherin Devaskar Lab
University of California Los Angeles
--
Assistant Project Scientist
Pediatrics-Neonatology
Sherin Devaskar Lab
University of California Los Angeles
How should I plot upregulated genes using current version ?
thanks
On Thu, May 24, 2018 at 10:05 PM, SHUBHAMOY GHOSH sghoshucla@ucla.edu wrote:
I mean when it will release?
On Thu, May 24, 2018 at 9:54 PM, SHUBHAMOY GHOSH sghoshucla@ucla.edu wrote:
where do I get it? Can you please share the link?
thanks
On Thu, May 24, 2018 at 9:42 PM, Guangchuang Yu <notifications@github.com
wrote:
@sghoshuc https://github.com/sghoshuc this feature will be available with enrichplot v >= 1.0.1.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GuangchuangYu/DOSE/issues/20#issuecomment-391939391, or mute the thread https://github.com/notifications/unsubscribe-auth/Alw7nKwLsKZP8VXiPsoaBLUbbW2u3717ks5t14u6gaJpZM4LRqyV .
--
Assistant Project Scientist
Pediatrics-Neonatology
Sherin Devaskar Lab
University of California Los Angeles
--
Assistant Project Scientist
Pediatrics-Neonatology
Sherin Devaskar Lab
University of California Los Angeles
--
Assistant Project Scientist
Pediatrics-Neonatology
Sherin Devaskar Lab
University of California Los Angeles
sorry, I misunderstood. Yes, I tried with enrichplot 1.1.0 and it seems working. however, it's showing en warning Warning message: In if (nrow(x) < n) { : the condition has length > 1 and only the first element will be used Also, the plot takes forever to show up (maybe the numbers of genes are too many) and even if I select two of the 5 upregulated pathways it seems to plot all 5. is there any bug or am I doing something incorrectly?
thanks
On Thu, May 24, 2018 at 11:49 PM, SHUBHAMOY GHOSH sghoshucla@ucla.edu wrote:
How should I plot upregulated genes using current version ?
thanks
On Thu, May 24, 2018 at 10:05 PM, SHUBHAMOY GHOSH sghoshucla@ucla.edu wrote:
I mean when it will release?
On Thu, May 24, 2018 at 9:54 PM, SHUBHAMOY GHOSH sghoshucla@ucla.edu wrote:
where do I get it? Can you please share the link?
thanks
On Thu, May 24, 2018 at 9:42 PM, Guangchuang Yu < notifications@github.com> wrote:
@sghoshuc https://github.com/sghoshuc this feature will be available with enrichplot v >= 1.0.1.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GuangchuangYu/DOSE/issues/20#issuecomment-391939391, or mute the thread https://github.com/notifications/unsubscribe-auth/Alw7nKwLsKZP8VXiPsoaBLUbbW2u3717ks5t14u6gaJpZM4LRqyV .
--
Assistant Project Scientist
Pediatrics-Neonatology
Sherin Devaskar Lab
University of California Los Angeles
--
Assistant Project Scientist
Pediatrics-Neonatology
Sherin Devaskar Lab
University of California Los Angeles
--
Assistant Project Scientist
Pediatrics-Neonatology
Sherin Devaskar Lab
University of California Los Angeles
--
Assistant Project Scientist
Pediatrics-Neonatology
Sherin Devaskar Lab
University of California Los Angeles
@sghoshuc you should try 1.1.1 if you are using devel branch.
Hi Guangchuang,
I have a problem with dotplot, as shown by the figure below
Currently, I know that I can modify the font.size parameter. Do you have any better methods to solve the problem?
Thanks very much for your help.
Best Regards, Leon.
google search 'wrap text in ggplot', will give you solution.
Hi Guangchuang, I am trying to use cnetplot and it's not showing any edges. It worked before.
@sghoshuc if this is reproducible with latest version, post reproducible example in a new issue.
We can split the GSEA result into enriched and suppressed pathways using the example provided above, that's great, but how can we do this for over-representation analysis? Thank you.
Hej,
Is there a way I can use dotplot to select specific pathways from the results list? showCategory seems to only show the pathways from 1-x where showCategory = x. If you try showCategory = 1:10 it will only show 1 pathway.
Is there a way to pick out a mix of pathways to select for display in the dotplot?
Thanks for any help!
Joshua
I will add a
dotplot
method forGSEA
result.Any idea to improve?
Hi Guangchuang, I find plotting the gene ratio both on the x-axis and as dot size redundant. I think if you just keep it as a dot size, you could offer the users the possibility of plotting multiple groups on the x-axis and eliminate the facets in ggplot.
@BioLeon0209 Hi Leon, I am running into the same problem with dotplot as you showed above. How did you fix this issue? I could not figure out how to wrap the text. Thank you so much!
Hi Nathoo,
I solved the problem by defining a new function using ggplot2 and stringr to fulfill the dotplot.
The function is shown below:
" library(ggplot2) library(stringr)
dotplot_ylab <- function(x,width,top){ x.df <- as.data.frame(x) if(nrow(x.df)>top){ x.df <- x.df[1:top,] } x.df$GeneRatio <- unlist(lapply(as.list(x.df$GeneRatio), function(x) eval(parse(text=x)))) x.df$Description <- factor(x.df$Description,levels = x.df$Description[order(x.df$Count,decreasing = F)]) #x.df <- x.df[order(x.df$Count,decreasing = F),] p<-ggplot(x.df, aes(x=GeneRatio, y=Description,color=p.adjust)) + geom_point(aes(size = Count))+scale_color_gradient(low="red", high="blue")+scale_y_discrete(labels=function(x) str_wrap(x,width=width)) }
" You could check whether it is ok for you.
Best Wishes, Leon.
------------------ Original ------------------ From: "Isaac Nathoo";<notifications@github.com>; Send time: Friday, Jun 26, 2020 1:01 PM To: "YuLab-SMU/DOSE"<DOSE@noreply.github.com>; Cc: "Changliang"<wangchangliang0209@foxmail.com>; "Mention"<mention@noreply.github.com>; Subject: Re: [YuLab-SMU/DOSE] visualizing GSEA results with 'dotplot' (#20)
@BioLeon0209 Hi Leon, I am running into the same problem with dotplot as you showed above. How did you fix this issue? I could not figure out how to wrap the text. Thank you so much!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Could you make it so that if a gene is up or down regulated could be identified by the shape of their point in dotplot? Example: 2 shapes, circle for upregulated, triangle for downreglated
open a new issue if your question is unsolved.
Hi Guangchuang, Maybe a somewhat naive question (feature request?), but is it somehow possible to visualize the results of a GSE run (of a single but also multiple runs) in a compareCluster-dotplot-like figure? I am asking because it would be really cool and helpful if one could represent the top up- and down-regulated genesets, ideally from multiple runs, in a single graph. For example, like you implemented for the
compareCluster()
function. I was triggered by this idea after reading your online vignette, specifically section 13.2 here, and knowing of the functionmerge_result()
here.I have something in mind like this picture (from the link to section 13.2 above): ... which, 'translated' to GSE results, should rather show the up- and down-regulated gene sets form GSE analysis A and B (sorted either by significance or NES), and color coding representing significance, and node size equaling Gene Set size. (??)
Thanks for considering, Guido
As expected, the function dotplot() doesn't work with the output of a gene set enrichment analysis performed with DOSE.