YuLab-SMU / GOSemSim

:golf: GO-terms Semantic Similarity Measures
https://yulab-smu.top/biomedical-knowledge-mining-book/
58 stars 26 forks source link

error in mclusterSim when a gene-set has only a single GO term #48

Closed ulo closed 4 months ago

ulo commented 4 months ago

Hi, first of all, thank you for the great semantic similarity package!

I discovered an issue in the mclusterSim function: If a gene-set (cluster) has a single GO term, the result of the following matrix indexing returns a vector, instead of the similarity matrix: https://github.com/YuLab-SMU/GOSemSim/blob/26bb3f65cada0f1c9f240e1d84872531ca65f949/R/mclusterSim.R#L56

And the combineScores function will return the max of this vector, even if BMA was selected as combine method. This will result in the wrong similarity of 1 for any two gene-sets where one has only a single GO term which is also part of the other set's GO terms.

To fix it, you should add a drop=F to the above matrix indexing, to force getting a matrix result: scores[i,j] <- combineScores(go_matrix[gos1, gos2, drop=F], combine=combine)

kind regards, Ulrich

GuangchuangYu commented 4 months ago

incorporated and thanks.