YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
1.01k stars 254 forks source link

Formatting details of heatplot function #150

Closed deziahuja closed 6 years ago

deziahuja commented 6 years ago

Hello everyone. I'm working with the heatplot function offered in enrichPlot (as suggested by Guangchuang) to display the results of a KEGG and GO over-representation test. I was wondering if I could get some help with the following issues:

1.) There are 72 genes in my gene list, however, each time I print a heatplot, not all of them show, and those that do show vary every time (from 30-48 genes). How can I get all 72 to show at once, and could it be that there's a maximum, and I'm exceeding it?

2.) Is it possible to switch the axis of my heatplot? I'd prefer to have my genes along the Y-axis, and my annotations along the X-axis.

GuangchuangYu commented 6 years ago
  1. any reproducible example?

  2. can be done by p + ggplot2::coord_flip().

deziahuja commented 6 years ago

Hello Guangchang, apologies for the late response, but here is a similar code that I have been using to create my heatmaps. It should be noted that the code I used here, as well as the heatmaps that I posted, have been created from a subset of 30 genes from the DOSE geneList, not from my PI's data. Also, I am new to R (learning it within the last 3 weeks) so my code may seem very unsophisticated. These are the issues I have faced since my last post:

////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// rm(list=ls()) library("clusterProfiler") library("enrichplot") library("org.Hs.eg.db") library("RColorBrewer") library("ggplot2")

Calling upon geneList and activating color package

data(geneList, package="DOSE") gene <- names(geneList) gene.df <- bitr(gene, fromType = "ENTREZID", toType = c("ENSEMBL", "SYMBOL"), OrgDb = org.Hs.eg.db) de <- names(geneList)[1:30]

hm.palette <- colorRampPalette(rev(brewer.pal(9, 'YlOrRd')), space='Lab')

1.) Running of GO Over-representation test for biological processes & creation of custom heatplot

ego <- enrichGO(de, OrgDb = "org.Hs.eg.db", ont="BP", readable=TRUE)

ego2 <- simplify(ego)

final_product_GO_BP_Over_rep <- heatplot(ego2, foldChange = geneList, showCategory = 20000)+ ggplot2::coord_flip()+ ggplot2::scale_fill_gradientn(colours = hm.palette(100))+ ggplot2::ylab('Annotations of Biological Processes')+ ggplot2::xlab('Gene Symbols')+ ggplot2::ggtitle('GO Over-representation for Biological Processes')+ ggplot2::theme(panel.background=element_rect(fill="black", colour="black"))+ ggplot2::guides(fill=guide_legend(title="LOD (pg/ml)"))

final_product_GO_BP_Over_rep

2.) Running of KEGG Over-representation test for biological processes & creation of custom heatplot

kk <- enrichKEGG(de, organism = 'hsa', pvalueCutoff = 0.05)

head(kk)

final_productKEGG_Overrep <- heatplot(kk, foldChange = geneList, showCategory = 20000)+ ggplot2::coord_flip()+ ggplot2::scale_fill_gradientn(colours = hm.palette(100))+ ggplot2::ylab('Annotations of Biological Processes')+ ggplot2::xlab('Gene Symbols')+ ggplot2::ggtitle('KEGG Over-representation for Biological Processes')+ ggplot2::theme(panel.background=element_rect(fill="black",colour="black"))+ ggplot2::guides(fill=guide_legend(title="LOD (pg/ml)"))

final_productKEGG_Overrep

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

dose_sample_go_bp_over_heatmap

dose_sample_kegg_over_plot

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

1.) How do we display ALL gene symbols of genes from the geneList on both heatmaps? he The GO biological processes over-representation heatmap that I created only displays 22 of the 30 genes included on the geneList, and the KEGG over-representation heatmap only displays 5 of the 30 genes.

2.) As you'll note with the KEGG over-representation heatmap, the genes that it does display are found in their gene ID form, and not their official gene symbol. This is something that has happened for every KEGG over-representation heatmap I have made (including other datasets). How do we get the KEGG over-representation heatmap to display gene symbols instead of gene ID?

3.) As you'll also note with the KEGG over-representation heatmap, it only displays a single biological process. When running the head function on the enrichKEGG results, one can clearly see that there are far more biological processes in which the genes are involved in, than just the "IL-17 signaling pathway". I've tried setting the "showCategory =" argument to an absurdly high number to see if it would make a difference, and it has not. Is there a way to display all of the biological processes on a KEGG over-representation heatmap?

GuangchuangYu commented 6 years ago
> geneInCategory(ego2)[as.data.frame(ego2)$ID] %>% unlist %>% unique
 [1] "CDCA8"  "CDC20"  "KIF23"  "CENPE"  "MYBL2"  "NDC80"  "TOP2A"  "NCAPH"
 [9] "ASPM"   "DLGAP5" "S100A9" "S100A8" "S100A7" "CDC45"  "E2F8"   "CXCL10"
[17] "RRM2"   "MCM10"  "MELK"   "BCL2A1" "MARCO"  "LAMP3"

for figure 1, indeed only 22 genes in the enriched terms.


> kk
#
# over-representation test
#
#...@organism    hsa
#...@ontology    KEGG
#...@keytype     kegg
#...@gene    chr [1:30] "4312" "8318" "10874" "55143" "55388" "991" "6280" "2305" ...
#...pvalues adjusted by 'BH' with cutoff <0.05
#...1 enriched terms found
'data.frame':   1 obs. of  9 variables:
 $ ID         : chr "hsa04657"
 $ Description: chr "IL-17 signaling pathway"
 $ GeneRatio  : chr "5/18"
 $ BgRatio    : chr "93/7430"
 $ pvalue     : num 2.08e-06
 $ p.adjust   : num 7.48e-05
 $ qvalue     : num 6.56e-05
 $ geneID     : chr "4312/6280/6279/6278/3627"
 $ Count      : int 5
#...Citation
  Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
  clusterProfiler: an R package for comparing biological themes among
  gene clusters. OMICS: A Journal of Integrative Biology
  2012, 16(5):284-287

for figure 2, indeed only one enriched pathway found.

...pvalues adjusted by 'BH' with cutoff <0.05

...1 enriched terms found

'data.frame': 1 obs. of 9 variables:

stepanovacz commented 1 year ago

@deziahuja @GuangchuangYu How can one order the genes on the y axis so they are right next to each other for each functional group on the x axis? Thinking of a ladder style heatmap