YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
1.01k stars 253 forks source link

Get an error when running compareCluster for enrichDO #273

Open gsx-ucas opened 4 years ago

gsx-ucas commented 4 years ago

Y叔你好:

我在用compareCluster的方法运行enrichDO的时候报了一个error:

Error in compareCluster(geneClusters = GeneList, fun = "enrichDO", ont = "DO",  : 
  No enrichment found in any of gene cluster, please check your input...

我的命令如下:

compareCluster(geneClusters = GeneList,
                            fun = "enrichDO",
                            ont = "DO",
                            pvalueCutoff = 0.05,
                            readable = T)

GeneList 内容为(只显示了前四个基因):

$`1`
[1] "100093631" "2534"      "222865"    "5605"     

$`4`
[1] "6505" "3157" "1959" "7292"

$`6`
[1] "51704"  "23092"  "149603" "51621" 

$`9`
[1] "25864" "2201"  "688"   "51084"

这个报错是因为没有富集到任何结果,但是其他方法比如enrichGO, enrichKEGG就不会报这样的错误,这个能不能改为返回空值而不是一个error呢?

huerqiang commented 4 years ago

@goushixue 你好,我测试了你的数据和代码。在compareCluster函数中,当没有显著富集结果时,enrichDOenrichGOenrichKEGG均会给出报错:No enrichment found in any of gene cluster, please check your input...。你说的没富集到任何结果但没报错的情况,是不是指clusterProfiler::enrichGOclusterProfiler::enrichKEGGDOSE::enrichDO这三个函数?

gsx-ucas commented 4 years ago

@huerqiang 你好,非常感谢!我可能忽略了一些细节,我遇到的情况是enrichGOenrichKEGG有部分样本无富集,在enrichDO是所有样本都无富集。但是像你说的这样,这各种情况下无任何富集返回error似乎太合适,这会导致R进程直接结束,如果能返回一个内容为空的对象会更好,目前我使用tryCatch函数跳过了这个直接报错,还是希望以后可以优化一下。 感谢🙏

pmt39 commented 4 years ago

May I please ask, how may I solve the problem in #274? I’m new on R and I would really appreciate any advice on the error. Thank you.

gsx-ucas commented 4 years ago

@pmt39 hello, Since I couldn't get your original data, I used my data to simulate your data, it looks like:

   Entrez        FC         group othergroup
1  653635 -2.241298 downregulated          B
2  374677 -1.315366 downregulated          A
3   93164 -1.770867 downregulated          A
4  286016 -1.435405 downregulated          A
5  266625  5.103488   upregulated          B
6  348013 -1.807979 downregulated          A
7    8284 -1.415521 downregulated          A
8   11096  2.001239   upregulated          B
9    8204 -1.099100 downregulated          A
10    396 -1.146883 downregulated          A

then I ran the command:

results <- compareCluster(Entrez~group+othergroup,
                          data=df,
                          organism = "hsa",
                          fun = "enrichKEGG",
                          pvalueCutoff = 0.05)

it turned out well:

#
# Result of Comparing 4 gene clusters 
#
#.. @fun     enrichKEGG 
#.. @geneClusters   List of 4
 $ downregulated.A: chr [1:556] "374677" "93164" "286016" "348013" ...
 $ downregulated.B: chr [1:109] "653635" "5730" "5228" "4804" ...
 $ upregulated.A  : chr [1:389] "51301" "4233" "55076" "5605" ...
 $ upregulated.B  : chr [1:88] "266625" "11096" "5671" "51084" ...
 - attr(*, "split_type")= chr "data.frame"
 - attr(*, "split_labels")='data.frame':    4 obs. of  2 variables:
  ..$ group     : chr [1:4] "downregulated" "downregulated" "upregulated" "upregulated"
  ..$ othergroup: chr [1:4] "A" "B" "A" "B"
#...Result  'data.frame':   6 obs. of  12 variables:
 $ Cluster    : Factor w/ 4 levels "downregulated.A",..: 1 1 1 1 3 3
 $ group      : chr  "downregulated" "downregulated" "downregulated" "downregulated" ...
 $ othergroup : chr  "A" "A" "A" "A" ...
 $ ID         : chr  "hsa04520" "hsa04659" "hsa04978" "hsa05130" ...
 $ Description: chr  "Adherens junction" "Th17 cell differentiation" "Mineral absorption" "Pathogenic Escherichia coli infection" ...
 $ GeneRatio  : chr  "10/256" "12/256" "8/256" "16/256" ...
 $ BgRatio    : chr  "71/8041" "107/8041" "58/8041" "202/8041" ...
 $ pvalue     : num  7.48e-05 1.41e-04 4.56e-04 6.94e-04 2.88e-04 ...
 $ p.adjust   : num  0.0187 0.0187 0.0403 0.046 0.0457 ...
 $ qvalue     : num  0.0179 0.0179 0.0386 0.044 0.0408 ...
 $ geneID     : chr  "6615/1499/87/7048/5797/7414/7525/10163/4008/81" "5534/3566/5603/4793/7040/3569/7048/3716/861/4772/3091/6777" "261729/4502/4501/538/4494/9843/4493/79901" "4642/347733/22989/10427/7280/84617/5603/4793/3569/4430/2149/203068/10383/10381/10163/7277" ...
 $ Count      : int  10 12 8 16 13 4
#.. number of enriched terms found for each gene cluster:
#..   downregulated.A: 4 
#..   downregulated.B: 0 
#..   upregulated.A: 2 
#..   upregulated.B: 0 
#
#...Citation
  Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
  clusterProfiler: an R package for comparing biological themes among
  gene clusters. OMICS: A Journal of Integrative Biology 2012,
  16(5):284-287 

And the dotplot worked fine!

I think one thing you didn’t notice is that enrichKEGG needs to specify the species. I noticed that you didn’t specify it in your code, so it will use humans by default.

You can find the species information for enrichKEGG parameter organism, like human is hsa, mouse is mmu, You can find what you want here

pmt39 commented 4 years ago

sift.txt

Hi again :). I attached the text file with the gene set I wanted to assess. I still got an error message such as this one. If it is fine with you, may you please cross verify that from your end, maybe its something wrong with my R or something strange going on I guess.

I used the final code as (since I am working with E. coli K12 strain):

results <- compareCluster(Entrez~group+othergroup,
                          data=mydf,
                          organism = "ece",
                          fun = "enrichKEGG",
                          pvalueCutoff = 0.05)
Error in compareCluster(Entrez ~ group + othergroup, data = mydf, organism = "ece",  : 
  No enrichment found in any of gene cluster, please check your input...
gsx-ucas commented 4 years ago

@pmt39 Hi,

I found two things need attention from your code and data:

I ran the command with parameter organism = "eco" but got the same results as you. Therefore, the fact is that your data is not enriched !

pmt39 commented 4 years ago

Oh, so maybe that could be the reason why enrichment is failing, a small sample size I reckon. Yes, you are right, eco is the one, thank you for noticing that. In that regard then, I suppose there's no enrichment.

I really appreciate the time and effort to scan through my data on such short notice as I am meandering around R as a new user. Milion thank you again :)