Open Jwenyi opened 1 year ago
In fact, the clustering of GO terms in treeplot
is based on pairwise_termsim
, which calculates the similarity between GO terms by Jaccard index, i.e., the similarity of the set of genes enriched in two GO terms. However, pairwise_termsim
can also calculate similarity based on GO semantics.
Thank you for your reply. I apologize if I didn't explain clearly, which led to some misunderstanding. I have read the source code of the treeplot module, and I did find that it first clusters GO terms based on semantic similarity and then provides a unique 'biological description' for each cluster. I would like to know how treeplot determines the 'biological description' for each cluster. Does it search for common parent nodes among these GO terms or use some other method?
It's just a word cloud, see here:
add_cladelab <- function(p, nWords, label_format_cladelab,
offset, roots,
fontsize, group_color, cluster_color,
pdata, extend, hilight, align) {
# align <- getOption("enriplot.treeplot.align", default = "both")
cluster_label <- sapply(cluster_color, get_wordcloud, ggData = pdata,
nWords = nWords)
I noticed a better way had been implemented in {aPEAR} package, see https://doi.org/10.1101/2023.03.28.534514
Each cluster is assigned a biologically meaningful name. The most important pathway in each cluster is determined using either PageRank (Page et al. 1999) (default) or HITS (Kleinberg 1999) algorithm that examines the connectivity within the cluster and detects the most important pathway. The description of this pathway is used as the name of the cluster.
We are trying to extract the information from the cluster more appropriately. But the direct use of the most significant pathway name as the cluster name may lose a lot of information. If you have any better suggestions, you are welcome to discuss them with us, and we will use them to improve treeplot, emapplot_cluster, and so on. Thanks.
Hi, I'm a phd student from cityU, HK. I found your 'treeplot' is of greatly interesting to us when we need to make in-depth insights into numerous GO-enriched terms. However, I'm curious about the method you used in 'treeplot' to find the co-ancestor of some GO terms, cuz we are trying to cluster genes based on their semantic similarity and then yield cluster-level terms that could represent the common biological participation of each cluster. So I'm wondering whether treeplot could be modified for this. Best, Wenyi