YuLab-SMU / GOSemSim

:golf: GO-terms Semantic Similarity Measures
https://yulab-smu.top/biomedical-knowledge-mining-book/
58 stars 26 forks source link

BMA combine测试问题反馈 #17

Closed wangshun1121 closed 6 years ago

wangshun1121 commented 6 years ago

余叔,您好。我现在尝试分析人和小鼠基因之间的GO语义相似性。发现两个基因间在采用BMA方法计算Combine Score时,结果有些不解:

首先,将人和小鼠的GO注释条目整合成一份GOSemSim对象,然后,以人的ENST00000429662和小鼠的ENSMMUG00000047697为例子,用下面的命令计算两者的Gene Semantic Similarity:

X=geneSim("ENST00000429662","ENSMMUG00000047697",semData=GOdb, measure="Wang", combine="BMA",drop="NULL") #GOdb为我自己制造的注释数据库

结果如下:

>X

$geneSim
[1] 0.956

$GO1
[1] "GO:0006351" "GO:0006357" "GO:0007005" "GO:0045944"

$GO2
[1] "GO:0006355"

该结果与采用max和rcmax得到的结果一致。于是,我接下来,采用下面的语句计算两基因的GO相似性:

SimScores =termSim(X$GO1,X$GO2,semData=GOdb)

结果:

> SimScores
           GO:0006355
GO:0006351 0.71102706
GO:0006357 0.95631920
GO:0007005 0.06089613
GO:0045944 0.79674836

然后,调用您CombineMethod.R的第57~59行,计算两组GO的Combine Score

sum(apply(SimScores, 1, max, na.rm = TRUE),apply(SimScores, 2, max, na.rm = TRUE))/sum(dim(SimScores))

计算得到的分数为0.696262,等于(0.71102706+0.95631920+0.06089613+0.79674836+0.95631920)/(4+1)

然而,输入下面语句,得到的结果仍然是0.956:

combineScores(SimScores,"BMA")

麻烦余叔也测试一下?

wangshun1121 commented 6 years ago

GOSemSim v2.2.0 R version 3.4.2 (2017-09-28) -- "Short Summer" 补充一下版本号

wangshun1121 commented 6 years ago

我明白了:

if (is.vector(SimScores) || nrow(SimScores)==1 || ncol(SimScores)==1) {
        if (combine == "avg") {
            return(round(mean(SimScores, na.rm=TRUE), digits=3))
        } else {
            return (round(max(SimScores, na.rm=TRUE), digits=3))
        }
    }

上面有这一行,意味着若某个基因只有一个GO,则除了avg之外Combine方法全是Max