YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
967 stars 246 forks source link

Entire species designation is included in KEGG Pathway name #670

Closed npokorzynski closed 4 months ago

npokorzynski commented 4 months ago

Hi,

I'm trying to run an old chunk of code for a clusterProfiler analysis, but with the new package version, the names of differentially expressed pathways include the entire species and strain name for my organism from the KEGG database. For an example, see the attached emap plot. Is there a way to either indicate to gseKEGG to only include the pathway name, or, alternatively, edit the titles in the gseKEGG object so that the plots do not include all of this additional text?

My code is below:

#read data from DESeq2 output csv that has been annotated already
Mg10<-read.csv("SALT_Mg10_DESeq2_08172021_annotated.csv")

#select log2foldchange data
Mg10list<-Mg10[,3]

#construct vector with ENSEMBL gene IDs to define fold change values
names(Mg10list)<-as.character(Mg10[,1])

#sort as decreasing list
Mg10list<-sort(Mg10list, decreasing=TRUE)

#perform gene set enrichment analysis using KEGG pathway enrichment
Mg10k <- gseKEGG(geneList = Mg10list, organism = 'seo', minGSSize = 20, pvalueCutoff = 0.05, verbose = FALSE)

#compute similarity between KEGG pathway terms
pMg10k <- pairwise_termsim(Mg10k, method = "JC", semData = NULL, showCategory = 400)

#emapplot of similarity between top 15 enriched KEGG terms
emapplot(pMg10k, showCategory = 20)

Rplot01.pdf

guidohooiveld commented 4 months ago

Yes, since release 4.10 (I believe) clusterProfiler adds additional information that is present in the KEGG database to the output of gseKEGG. This includes the category and subcategory of the pathway, and also the species name (although I am not sure whether this is actually due to changes made by KEGG themselves).

Anyway, to remove these from the results and subsequent plots I simply do this (quick and dirty) using gsub (on your Mg10k object; so before computing the pairwise similarity):

Mg10k@result$Description <- gsub(pattern = " - Salmonella enterica subsp. enterica serovar Typhimurium 14028S",
                                 replacement = "",
                                 Mg10k@result$Description,
                                 fixed = TRUE)
npokorzynski commented 4 months ago

That is very helpful, thanks! One related question - I was getting around this by manually adding y-axis labels [e.g., scale_y_discrete(labels = c(...))] and in that context it provides the labels as single lines of text, rather than the default which is to wrap the text. I find that the wrapping makes plot formatting very awkward because the word crowding is difficult to read. Is there a way to stop the text wrapping?

guidohooiveld commented 4 months ago

AFAIK it is not possible to stop the text wrapping. Yet, by setting the argument label_format (default value = 30) you can set the number of characters after which text wrapping should occur.

A simple way of not having text wrapping could be something like:

n.char <- max( nchar ( as.data.frame(pMg10k)$Description )  )
emapplot(pMg10k, showCategory = 10, label_format=n.char)
npokorzynski commented 4 months ago

Works perfectly, thank you!