YuLab-SMU / DOSE

:mask: Disease Ontology Semantic and Enrichment analysis
https://yulab-smu.top/biomedical-knowledge-mining-book/
116 stars 36 forks source link

fix bugs in `get_ont_info()` and `get_dose_data()`: wrong object name and wrong data type #80

Closed huerqiang closed 10 months ago

huerqiang commented 11 months ago

(1)get_ont_info()的ontology == "MDO"数据中ancmap和termmap有错误,应该是从HDO获取,而不是从HPO获取。 (2)MPOMPMGI对象名误写成MPGMGIDO (3)get_dose_data()中应该对EG2ALLTERM.df增加as.character()操作,将factor转为character,这样可以避免enricher()等函数报错。 例子:

> library(DOSE)

DOSE v3.29.1.991  For help: https://yulab-smu.top/biomedical-knowledge-mining-book/

If you use DOSE in published research, please cite:
Guangchuang Yu, Li-Gen Wang, Guang-Rong Yan, Qing-Yu He. DOSE: an R/Bioconductor package for Disease Ontology Semantic and Enrichment analysis. Bioinformatics 2015, 31(4):608-609

> library(GOSemSim)
GOSemSim v2.28.0  For help: https://yulab-smu.top/biomedical-knowledge-mining-book/

If you use GOSemSim in published research, please cite:
- Guangchuang Yu. Gene Ontology Semantic Similarity Analysis Using GOSemSim. In: Kidder B. (eds) Stem Cell Transcriptional Networks. Methods in Molecular Biology, 2020, 2117:207-215. Humana, New York, NY. doi:10.1007/978-1-0716-0301-7_11
- Guangchuang Yu, Fei Li, Yide Qin, Xiaochen Bo, Yibo Wu, Shengqi Wang. GOSemSim: an R package for measuring semantic similarity among GO terms and gene products Bioinformatics 2010, 26(7):976-978. doi:10.1093/bioinformatics/btq064

载入程辑包:‘GOSemSim’

The following objects are masked from ‘package:DOSE’:

    clusterSim, geneSim, mclusterSim

> library(MPO.db)
MPO.db version 0.99.7
> library(HPO.db)
HPO.db version 0.99.2
> genes <- keys(MPO.db, "mgi")
> set.seed(123)
> gene <- sample(genes, 100)
> genelist <- runif(length(genes),-2,2)
> names(genelist) <- genes
> genelist <- sort(genelist, decreasing = TRUE)
> edo = enrichDO(gene, pvalueCutoff = 1, qvalueCutoff = 1, 
+       organism = "mmu", minGSSize = 1, maxGSSize = Inf)
> head(edo)
                       ID                                                         Description GeneRatio    BgRatio      pvalue  p.adjust    qvalue
DOID:0060783 DOID:0060783 ectrodactyly, ectodermal dysplasia, and cleft lip-palate syndrome 3     16/82 1065/12553 0.001273699 0.7994451 0.7994451
DOID:0060347 DOID:0060347                                                  acrorenal syndrome     12/82  715/12553 0.002236440 0.7994451 0.7994451
DOID:0110969 DOID:0110969                                               brachydactyly type B1      9/82  467/12553 0.003332270 0.7994451 0.7994451
DOID:0070061 DOID:0070061           autosomal dominant intellectual developmental disorder 31      2/82   15/12553 0.004188460 0.7994451 0.7994451
DOID:0111094 DOID:0111094                              Fanconi anemia complementation group N     14/82  994/12553 0.004759278 0.7994451 0.7994451
DOID:0050833 DOID:0050833                                                     orotic aciduria      6/82  244/12553 0.005187162 0.7994451 0.7994451
                                                                                                            geneID Count
DOID:0060783 110253/17390/13489/54635/69327/245884/64654/242022/414758/18992/387206/235504/67839/16206/23807/22287    16
DOID:0060347                            17390/54635/69327/71846/245884/64654/414758/387206/67839/23807/72043/22287    12
DOID:0110969                                              208647/17390/19726/286940/69327/242022/19822/22627/16206     9
DOID:0070061                                                                                           17390/18936     2
DOID:0111094                 17390/19726/29859/13489/286940/69327/64654/18992/18049/21406/19822/235504/22627/16206    14
DOID:0050833                                                                  19726/69327/242022/19822/22627/22287     6
> gdo = gseDO(genelist, pvalueCutoff = 1, 
+       organism = "mmu", minGSSize = 1, maxGSSize = Inf)
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
> 
> head(gdo)
                       ID                                Description setSize enrichmentScore       NES     pvalue  p.adjust    qvalue rank                   leading_edge   core_enrichment
DOID:643         DOID:643 progressive multifocal leukoencephalopathy       2      -0.9133641 -1.456511 0.03103367 0.9540816 0.9540816  295 tags=100%, list=2%, signal=98%             50701
DOID:0050890 DOID:0050890                            synucleinopathy       5      -0.6306940 -1.394528 0.13133208 0.9540816 0.9540816 1609 tags=60%, list=11%, signal=53% 78309/50873/56424
DOID:14330     DOID:14330                        Parkinson's disease       5      -0.6306940 -1.394528 0.13133208 0.9540816 0.9540816 1609 tags=60%, list=11%, signal=53% 78309/50873/56424
DOID:0080001 DOID:0080001                               bone disease       2       0.8383321  1.350160 0.10139165 0.9540816 0.9540816 1552 tags=50%, list=11%, signal=45%             80752
DOID:65           DOID:65                  connective tissue disease       2       0.8383321  1.350160 0.10139165 0.9540816 0.9540816 1552 tags=50%, list=11%, signal=45%             80752
DOID:8283       DOID:8283                                peritonitis       1      -0.9799548 -1.312227 0.05220884 0.9540816 0.9540816  295 tags=100%, list=2%, signal=98%                  
> 
> # 小鼠表型富集分析
> empo = enrichMPO(gene, pvalueCutoff = 1, qvalueCutoff = 1, 
+       minGSSize = 1, maxGSSize = Inf)
> head(empo)
                   ID                           Description GeneRatio  BgRatio      pvalue  p.adjust    qvalue                    geneID Count
MP:0000166 MP:0000166       abnormal chondrocyte morphology     4/100 83/14617 0.002506417 0.5367257 0.5226013 208647/286940/64654/26562     4
MP:0006429 MP:0006429 abnormal hyaline cartilage morphology     3/100 46/14617 0.003810749 0.5367257 0.5226013        17390/286940/18936     3
MP:0000167 MP:0000167          decreased chondrocyte number     2/100 15/14617 0.004591140 0.5367257 0.5226013              208647/64654     2
MP:0008324 MP:0008324       abnormal melanotroph morphology     1/100  1/14617 0.006841349 0.5367257 0.5226013                     13489     1
MP:0008331 MP:0008331      increased lactotroph cell number     1/100  1/14617 0.006841349 0.5367257 0.5226013                     13489     1
MP:0008423 MP:0008423        decreased lactotroph cell size     1/100  1/14617 0.006841349 0.5367257 0.5226013                     13489     1
>  
> gmpo = gseMPO(genelist, pvalueCutoff = 1, 
+       minGSSize = 1, maxGSSize = Inf)  
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
> 
> head(gmpo)
                   ID                            Description setSize enrichmentScore       NES       pvalue  p.adjust    qvalue rank                   leading_edge
MP:0000405 MP:0000405       abnormal auchene hair morphology      14       0.6754086  2.151384 0.0002616932 0.9810542 0.9810542 2276 tags=57%, list=16%, signal=48%
MP:0010762 MP:0010762    abnormal microglial cell activation      25       0.5616637  2.138701 0.0001743608 0.9810542 0.9810542 3303 tags=48%, list=23%, signal=37%
MP:0000229 MP:0000229 abnormal megakaryocyte differentiation      22      -0.5989617 -2.132420 0.0002949974 0.9810542 0.9810542 2367 tags=50%, list=16%, signal=42%
MP:0001224 MP:0001224        abnormal keratinocyte apoptosis      16       0.6354358  2.107630 0.0004873834 0.9810542 0.9810542 4544 tags=81%, list=31%, signal=56%
MP:0001195 MP:0001195                             flaky skin      22       0.5632225  2.052313 0.0016116417 0.9810542 0.9810542 1655 tags=41%, list=11%, signal=36%
MP:0003846 MP:0003846                            matted coat       8       0.7510336  1.988744 0.0021891237 0.9810542 0.9810542 1874 tags=50%, list=13%, signal=44%
                                                                          core_enrichment
MP:0000405                                23872/20674/20672/18426/14835/56460/18194/14176
MP:0010762 17184/100038570/19214/83433/216739/12774/237868/11516/18145/24088/245944/21929
MP:0000229              78933/69296/12394/14582/14460/64214/17886/16452/22145/22761/27260
MP:0001224 12608/21847/13983/17199/106025/56460/19015/22173/16664/16151/24102/12367/19698
MP:0001195                        20249/18992/14246/16661/13122/16905/77055/170720/225049
MP:0003846                                                        20249/68268/14034/56460
> 
> 
> # 人类表型富集分析
> data(geneList)
> gene <- sample(names(geneList), 100)    
> gene
  [1] "80146"     "55255"     "9796"      "9711"      "9093"      "10073"     "9788"      "2788"      "55620"     "6484"      "8814"      "10767"     "374354"    "9152"      "23646"     "27202"     "55112"     "2995"      "6905"     
 [20] "8848"      "10328"     "925"       "1789"      "23505"     "10370"     "10020"     "54457"     "10778"     "29907"     "813"       "64600"     "4594"      "22858"     "55113"     "51192"     "7867"      "6898"      "23111"    
 [39] "55282"     "3376"      "51734"     "28972"     "5256"      "22953"     "29940"     "287"       "25852"     "10439"     "10268"     "10733"     "3565"      "6617"      "79018"     "54718"     "57226"     "8639"      "23011"    
 [58] "51123"     "84193"     "23161"     "1101"      "57020"     "1186"      "79571"     "10785"     "4477"      "267"       "5027"      "5462"      "6386"      "5721"      "6233"      "100293516" "81621"     "6331"      "23471"    
 [77] "7015"      "56311"     "28983"     "170680"    "8793"      "3300"      "27236"     "4701"      "26586"     "53826"     "79166"     "6536"      "23186"     "84975"     "57830"     "23526"     "5311"      "79888"     "25909"    
 [96] "4486"      "79906"     "2286"      "794"       "7247"     
> 
> ehpo = enrichHPO(gene, pvalueCutoff = 1, qvalueCutoff = 1, 
+       minGSSize = 1, maxGSSize = Inf)
> head(ehpo)
                   ID                      Description GeneRatio  BgRatio       pvalue  p.adjust    qvalue                                     geneID Count
HP:0001279 HP:0001279                          Syncope      5/31 111/4885 0.0005867311 0.4670664 0.4419055                   9152/10370/287/6331/7015     5
HP:0001744 HP:0001744                     Splenomegaly      8/31 340/4885 0.0009677496 0.4670664 0.4419055 374354/2995/10020/4594/3376/5256/1186/7015     8
HP:0004445 HP:0004445                   Elliptocytosis      2/31   8/4885 0.0010657912 0.4670664 0.4419055                                374354/2995     2
HP:0004312 HP:0004312 Abnormal reticulocyte morphology      4/31  82/4885 0.0016436366 0.4670664 0.4419055                      374354/2995/1186/7015     4
HP:0001923 HP:0001923                  Reticulocytosis      3/31  48/4885 0.0032998637 0.4670664 0.4419055                           374354/2995/1186     3
HP:0002605 HP:0002605                 Hepatic necrosis      2/31  14/4885 0.0033825831 0.4670664 0.4419055                                374354/7015     2
>  
> ghpo = gseHPO(geneList, pvalueCutoff = 1, 
+       minGSSize = 1, maxGSSize = Inf)
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
Warning message:
In fgseaMultilevel(pathways = pathways, stats = stats, minSize = minSize,  :
  For some pathways, in reality P-values are less than 1e-10. You can set the `eps` argument to zero for better estimation.
> 
> head(ghpo)
                   ID                                Description setSize enrichmentScore      NES       pvalue     p.adjust       qvalue rank                   leading_edge
HP:0011893 HP:0011893                   Abnormal leukocyte count     380       0.4119412 1.982576 1.000000e-10 4.939000e-07 4.681053e-07 2481 tags=35%, list=20%, signal=29%
HP:0032251 HP:0032251          Abnormal immune system morphology     550       0.3580838 1.783569 1.000000e-10 4.939000e-07 4.681053e-07 2487 tags=32%, list=20%, signal=27%
HP:0001881 HP:0001881              Abnormal leukocyte morphology     540       0.3562669 1.768988 2.061199e-10 5.090130e-07 4.824290e-07 2487 tags=31%, list=20%, signal=26%
HP:0010987 HP:0010987 Abnormal cellular immune system morphology     540       0.3562669 1.768988 2.061199e-10 5.090130e-07 4.824290e-07 2487 tags=31%, list=20%, signal=26%
HP:0040088 HP:0040088                  Abnormal lymphocyte count     187       0.4892985 2.152656 4.617016e-10 9.121377e-07 8.644998e-07 2508 tags=42%, list=20%, signal=34%
HP:0032169 HP:0032169                           Severe infection      59       0.6554646 2.401302 4.743272e-09 7.653309e-06 7.253602e-06 1358 tags=41%, list=11%, signal=36%
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                core_enrichment
HP:0011893                                                                                                                                                                                                                     55388/9837/29851/55215/81570/1503/5888/1493/3070/7037/2175/3932/5551/3559/6772/51311/3507/5645/4609/3561/84823/917/9401/641/3654/5698/3574/54892/3575/919/81693/4860/915/22806/55159/2178/4938/3458/959/1789/5336/11151/3930/3702/925/79650/64135/5557/28755/974/2120/6897/6916/1991/867/11330/1794/3689/5788/916/4068/23250/83990/3937/30009/2539/3394/10525/100/2072/6696/5052/2189/5880/4522/7128/4683/81622/4210/6789/930/2542/26191/204/6850/9056/10095/6427/8882/56244/83737/7852/613/7454/5692/64170/833/843/2213/1053/10125/8456/8625/3071/672/2000/8676/4478/1080/356/1380/7319/958/4700/1773/675/1041/2215/5591/25839/54440/55636/4893/3570/3978/5371/8651/10625/7036/5889/64858/29927
HP:0032251 55388/9837/3832/9319/29851/55215/701/81570/1503/5888/1493/3070/7037/2175/4173/1075/3932/3110/5551/3559/6772/51311/3507/7298/699/5645/4609/3561/84823/917/9401/641/10535/1029/3654/5698/3574/54892/3575/919/81693/4860/915/22806/55159/2178/4938/2821/1535/3458/959/1789/7112/5336/11151/3930/3702/925/4688/4436/79650/64135/5557/28755/974/8557/2120/6897/63976/6916/3514/1991/867/10507/25939/11330/1794/3689/5788/916/4068/23250/83990/3937/30009/2539/11200/3394/10525/100/2072/6696/940/26511/5052/2189/939/4689/5880/4522/7128/4683/81622/4210/6789/930/6573/4624/2542/1788/26191/204/6850/9056/10095/5604/6427/8882/56244/83737/7852/613/1410/7454/5692/348/64170/833/843/7273/2213/1053/1536/10125/8456/8625/3071/672/2000/8676/23092/4478/1080/356/1380/7319/7305/7133/958/4700/55505/1773/675/1041/2215/3587/5591/10430/88/5567/25839/54440/55636/4893/3570/3978/5371/8651/10625/7036/5889/64858/29927/5373
HP:0001881                     55388/9837/3832/9319/29851/55215/701/81570/1503/5888/1493/3070/7037/2175/4173/3932/5551/3559/6772/51311/3507/7298/699/5645/4609/3561/84823/917/9401/641/10535/1029/3654/5698/3574/54892/3575/919/81693/4860/915/22806/55159/2178/4938/2821/1535/3458/959/1789/7112/5336/11151/3930/3702/925/4688/4436/79650/64135/5557/28755/974/8557/2120/6897/63976/6916/1991/867/10507/25939/11330/1794/3689/5788/916/4068/23250/83990/3937/30009/2539/11200/3394/10525/100/2072/6696/940/26511/5052/2189/939/4689/5880/4522/7128/4683/81622/4210/6789/930/6573/4624/2542/1788/26191/204/6850/9056/10095/5604/6427/8882/56244/83737/7852/613/1410/7454/5692/348/64170/833/843/7273/2213/1053/1536/10125/8456/8625/3071/672/2000/8676/23092/4478/1080/356/1380/7319/7305/7133/958/4700/55505/1773/675/1041/2215/5591/10430/88/5567/25839/54440/55636/4893/3570/3978/5371/8651/10625/7036/5889/64858/29927/5373
HP:0010987                     55388/9837/3832/9319/29851/55215/701/81570/1503/5888/1493/3070/7037/2175/4173/3932/5551/3559/6772/51311/3507/7298/699/5645/4609/3561/84823/917/9401/641/10535/1029/3654/5698/3574/54892/3575/919/81693/4860/915/22806/55159/2178/4938/2821/1535/3458/959/1789/7112/5336/11151/3930/3702/925/4688/4436/79650/64135/5557/28755/974/8557/2120/6897/63976/6916/1991/867/10507/25939/11330/1794/3689/5788/916/4068/23250/83990/3937/30009/2539/11200/3394/10525/100/2072/6696/940/26511/5052/2189/939/4689/5880/4522/7128/4683/81622/4210/6789/930/6573/4624/2542/1788/26191/204/6850/9056/10095/5604/6427/8882/56244/83737/7852/613/1410/7454/5692/348/64170/833/843/7273/2213/1053/1536/10125/8456/8625/3071/672/2000/8676/23092/4478/1080/356/1380/7319/7305/7133/958/4700/55505/1773/675/1041/2215/5591/10430/88/5567/25839/54440/55636/4893/3570/3978/5371/8651/10625/7036/5889/64858/29927/5373
HP:0040088                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   55388/9837/29851/81570/1503/1493/3070/3932/3559/6772/51311/3507/4609/3561/84823/917/641/3654/5698/3574/54892/3575/919/4860/915/22806/1789/5336/11151/3930/3702/925/64135/5557/974/1991/1794/5788/916/4068/23250/3937/10525/100/6696/5880/4522/7128/4683/6789/930/204/6850/10095/56244/7852/7454/5692/64170/843/10125/8456/8625/3071/2000/4478/356/1380/5591/54440/55636/4893/3570/3978/8651/10625/7036/64858/51371
HP:0032169