YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
1.01k stars 253 forks source link

Error in ans[ypos] <- rep(yes, length.out = len)[ypos] : replacement has length zero #510

Open mick42-star opened 2 years ago

mick42-star commented 2 years ago

I use ‘clusterProfiler’ version 4.5.1.902 to do gse analysis, I got gse result and related dotplot, however, I cannot generate ridgeplot.

gse <- gseGO(geneList=gene_list, ont ="ALL", keyType = "SYMBOL", nPerm = 10000, minGSSize = 3, maxGSSize = 800, pvalueCutoff = 0.05, verbose = TRUE, OrgDb = "org.Mm.eg.db", pAdjustMethod = "none")

image

dotplot(gse, showCategory=10, split=".sign") + facet_grid(.~.sign)

image

ridgeplot(gse)

Error in ans[ypos] <- rep(yes, length.out = len)[ypos] : replacement has length zero

guidohooiveld commented 1 year ago

You did not provide your input data, but I suspect this is due to the fact that you use gene symbols as input. From my experience gene symbols are regularly duplicated (= the same gene symbol is used for 2 different genes; see below for examples), and as a result the ridgeplot fails (because that plots all genes belonging to a gene set). The dotplot will work because that plots only the results from the gene sets as such; no info on constituent genes is needed for such plot.

I suggest you use unique identifiers in all your analyses, such as entrez or ensembl ids.


> 
> library(org.Hs.eg.db)
> ids <- keys(org.Hs.eg.db)
> 
> mapping <- select(org.Hs.eg.db, keys=ids, columns=c('ENTREZID','SYMBOL'), keytype='ENTREZID')
'select()' returned 1:1 mapping between keys and columns
> mapping[duplicated(mapping[,"SYMBOL"]),] 
       ENTREZID    SYMBOL
4795       6052      RNR1
4796       6053      RNR2
11493     51072     MEMO1
28771 100124696       TEC
30812 100187828       HBD
35761 100505381      MMD2
53263 107648861  DEL11P13
54541 107985615 TRNAV-CAC
54642 107985753 TRNAV-CAC
63864 122405565    SMIM44
64999 123670537   DEL1P36
> 
> select(org.Hs.eg.db, keys="HBD", columns=c('ENTREZID','SYMBOL'), keytype='SYMBOL')
'select()' returned 1:many mapping between keys and columns
  SYMBOL  ENTREZID
1    HBD      3045
2    HBD 100187828
>
>
> select(org.Hs.eg.db, keys="MEMO1", columns=c('ENTREZID','SYMBOL'), keytype='SYMBOL')
'select()' returned 1:many mapping between keys and columns
  SYMBOL ENTREZID
1  MEMO1     7795
2  MEMO1    51072
>
>
> select(org.Hs.eg.db, keys="TRNAV-CAC", columns=c('ENTREZID','SYMBOL'), keytype='SYMBOL')
'select()' returned 1:many mapping between keys and columns
     SYMBOL  ENTREZID
1 TRNAV-CAC 107985614
2 TRNAV-CAC 107985615
3 TRNAV-CAC 107985753
> 

```> 
mick42-star commented 1 year ago

guidohooiveld

Many thanks. It solved the problem when I converted all gene names to ENTREZID.

javifar commented 4 months ago

I had the same problem from this code

kegg_gene_list <- c(7042= 0.365,10135= 0.218,3553= 0.175,5291= 0.167, 114548= 0.163,22861= 0.089,4780= 0.078,942= 0.061, 3902= -0.005,3586= -0.011,1029= -0.186,3458` = -0.282 )

gene set enrichment analysis

gse_res <- gseDO(kegg_gene_list, minGSSize = 5, pvalueCutoff = 0.2, pAdjustMethod = "BH", verbose = FALSE)

enrichplot::ridgeplot(gse_res)

`

guidohooiveld commented 4 months ago

You have, or had the same problem? What was the error message?

Anyway, make sure that the ids of your input are characters! The way you put it now the ids are considered numeric, and these are not recognized! Also note the presence of a back-tick (`) after the last id (3458).

After correcting this, and setting the significance cutoff to 1, it is working in my hands:

> ## load libraries
> library(clusterProfiler)
> library(DOSE)
> library(enrichplot)
> 
> ## input, with ids as characters (and not numeric)
> kegg_gene_list <- c("7042"= 0.365,"10135"= 0.218,"3553"= 0.175,"5291"= 0.167, "114548"= 0.163,"22861"= 0.089,
+                     "4780"= 0.078,"942"= 0.061, "3902"= -0.005,"3586"= -0.011,"1029"= -0.186,"3458"= -0.282)
> 
> gse_res <- gseDO(kegg_gene_list,
+ minGSSize = 5,
+ pvalueCutoff = 1,
+ pAdjustMethod = "BH",
+ verbose = FALSE)
> 
> ## check
> gse_res 
#
# Gene Set Enrichment Analysis
#
#...@organism    Homo sapiens 
#...@setType     DO 
#...@keytype     ENTREZID 
#...@geneList    Named num [1:12] 0.365 0.218 0.175 0.167 0.163 0.089 0.078 0.061 -0.005 -0.011 ...
 - attr(*, "names")= chr [1:12] "7042" "10135" "3553" "5291" ...
#...nPerm        
#...pvalues adjusted by 'BH' with cutoff <1 
#...120 enriched terms found
'data.frame':   120 obs. of  11 variables:
 $ ID             : chr  "DOID:2531" "DOID:0070004" "DOID:1909" "DOID:4960" ...
 $ Description    : chr  "hematologic cancer" "myeloid neoplasm" "melanoma" "bone marrow cancer" ...
 $ setSize        : int  7 6 6 6 6 5 5 5 5 5 ...
 $ enrichmentScore: num  -0.781 -0.757 -0.757 -0.757 -0.757 ...
 $ NES            : num  -1.73 -1.62 -1.62 -1.62 -1.62 ...
 $ pvalue         : num  0.0502 0.0388 0.0388 0.0388 0.0388 ...
 $ p.adjust       : num  0.589 0.589 0.589 0.589 0.589 ...
 $ qvalue         : num  0.589 0.589 0.589 0.589 0.589 ...
 $ rank           : num  7 6 6 6 6 4 4 4 4 4 ...
 $ leading_edge   : chr  "tags=86%, list=58%, signal=86%" "tags=83%, list=50%, signal=83%" "tags=83%, list=50%, signal=83%" "tags=83%, list=50%, signal=83%" ...
 $ core_enrichment: chr  "4780/942/3902/3586/1029/3458" "942/3902/3586/1029/3458" "942/3902/3586/1029/3458" "942/3902/3586/1029/3458" ...
#...Citation
  Guangchuang Yu, Li-Gen Wang, Guang-Rong Yan, Qing-Yu He. DOSE: an
  R/Bioconductor package for Disease Ontology Semantic and Enrichment
  analysis. Bioinformatics 2015, 31(4):608-609 

> 
> ## plot
> ridgeplot(gse_res)
Picking joint bandwidth of 0.0757
> 
> 
> sessionInfo()
R version 4.4.0 Patched (2024-05-21 r86580 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: Europe/Amsterdam
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] enrichplot_1.24.0      DOSE_3.30.1            clusterProfiler_4.12.0

loaded via a namespace (and not attached):
  [1] DBI_1.2.2               shadowtext_0.1.3        gson_0.1.0             
  [4] gridExtra_2.3           rlang_1.1.3             magrittr_2.0.3         
  [7] ggridges_0.5.6          compiler_4.4.0          RSQLite_2.3.6          
 [10] png_0.1-8               vctrs_0.6.5             reshape2_1.4.4         
 [13] stringr_1.5.1           pkgconfig_2.0.3         crayon_1.5.2           
 [16] fastmap_1.2.0           XVector_0.44.0          labeling_0.4.3         
 [19] ggraph_2.2.1            utf8_1.2.4              HDO.db_0.99.1          
 [22] UCSC.utils_1.0.0        purrr_1.0.2             bit_4.0.5              
 [25] zlibbioc_1.50.0         cachem_1.1.0            aplot_0.2.2            
 [28] GenomeInfoDb_1.40.0     jsonlite_1.8.8          blob_1.2.4             
 [31] BiocParallel_1.38.0     tweenr_2.0.3            parallel_4.4.0         
 [34] R6_2.5.1                stringi_1.8.4           RColorBrewer_1.1-3     
 [37] GOSemSim_2.30.0         Rcpp_1.0.12             snow_0.4-4             
 [40] IRanges_2.38.0          Matrix_1.7-0            splines_4.4.0          
 [43] igraph_2.0.3            tidyselect_1.2.1        qvalue_2.36.0          
 [46] viridis_0.6.5           codetools_0.2-20        lattice_0.22-6         
 [49] tibble_3.2.1            plyr_1.8.9              Biobase_2.64.0         
 [52] treeio_1.28.0           withr_3.0.0             KEGGREST_1.44.0        
 [55] gridGraphics_0.5-1      scatterpie_0.2.2        polyclip_1.10-6        
 [58] Biostrings_2.72.0       pillar_1.9.0            ggtree_3.12.0          
 [61] stats4_4.4.0            ggfun_0.1.4             generics_0.1.3         
 [64] S4Vectors_0.42.0        ggplot2_3.5.1           munsell_0.5.1          
 [67] scales_1.3.0            tidytree_0.4.6          glue_1.7.0             
 [70] lazyeval_0.2.2          tools_4.4.0             data.table_1.15.4      
 [73] fgsea_1.30.0            fs_1.6.4                graphlayouts_1.1.1     
 [76] fastmatch_1.1-4         tidygraph_1.3.1         cowplot_1.1.3          
 [79] grid_4.4.0              tidyr_1.3.1             ape_5.8                
 [82] AnnotationDbi_1.66.0    colorspace_2.1-0        nlme_3.1-164           
 [85] GenomeInfoDbData_1.2.12 patchwork_1.2.0         ggforce_0.4.2          
 [88] cli_3.6.2               fansi_1.0.6             viridisLite_0.4.2      
 [91] dplyr_1.1.4             gtable_0.3.5            yulab.utils_0.1.4      
 [94] digest_0.6.35           BiocGenerics_0.50.0     ggrepel_0.9.5          
 [97] ggplotify_0.1.2         farver_2.1.2            memoise_2.0.1          
[100] lifecycle_1.0.4         httr_1.4.7              GO.db_3.19.1           
[103] bit64_4.0.5             MASS_7.3-60.2          
> 
>

image

javifar commented 4 months ago

It works, perhaps it was due to working with the old version github package