Open huerqiang opened 1 year ago
Question: is there any way to extract the clustering information from the emapplot() easily? I'm struggling on this for days... 😢😢😢
Theoretically, the understandability of the cluster information is determined by the number of keyword that being displayed, that is, the more keywords are shown, the more understandable the cluster would be. So I think we can leave the choice to users and let them determine how many keywords could be shown. Here's my example:
rm(list = ls())
library(DOSE)
library(enrichplot)
library(reshape2)
library(igraph)
library(magrittr)
data(geneList)
de <- names(geneList)[1:100]
x <- enrichDO(de)
x2 <- pairwise_termsim(x)
#############################################
x3 <- as.data.frame(x2)
x4 <- x2@termsim[as.character(x3$Description),as.character(x3$Description)]
w <- melt(x4)
wd <- w[w[,1] != w[,2],] %>% na.omit()
wd <- wd[wd$value != 0,]
##
g <- graph.data.frame(wd[, -3], directed=FALSE)
E(g)$value <- wd[, 3]
## calculate the number of clusters
centers_g <- ceiling(sqrt(nrow(x4)))
k_means <- kmeans(get.adjacency(g), centers = centers_g)
#### get the information of a certain cluster
info_n <- k_means$cluster[k_means$cluster==3] %>% names() # the 3rd cluster, for instance
## borrowing the word frequency function from @huerqiang
get_word_freq <- function(wordd){
dada <- strsplit(wordd, " ")
didi <- table(unlist(dada))
didi <- didi[order(didi, decreasing = TRUE)]
# Get the number of each word
word_name <- names(didi)
fun_num_w <- function(ww){
sum(vapply(dada, function(w){ww %in% w}, FUN.VALUE = 1))
}
word_num <- vapply(word_name, fun_num_w, FUN.VALUE = 1)
word_w <- word_num[order(word_num, decreasing = TRUE)]
}
##
#### how many keywords you wanna show? take 80% as an example~
info_cluster <- get_word_freq(info_n)[1:(0.8*length(get_word_freq(info_n)))] %>% names()
It's still not so perfect, but now we can have a clearer clue for understanding cluster information.
We now use wordcloud as the cluster name of emapplot_cluster().
But it is not good enough: https://github.com/YuLab-SMU/enrichplot/issues/241#issuecomment-1543504229
Please give a better way to display cluster information. You can get the code of wordcloud here: https://github.com/YuLab-SMU/enrichplot/blob/master/R/wordcloud.R