YuLab-SMU / GOSemSim

:golf: GO-terms Semantic Similarity Measures
https://yulab-smu.top/biomedical-knowledge-mining-book/
58 stars 26 forks source link

Information content value is Inf #25

Closed datduong closed 5 years ago

datduong commented 5 years ago

Hi, I am using the following to get the information content,

library("GOSemSim")

goDatabase = godata('org.Dm.eg.db', ont='BP')

goDatabase@IC ## see IC for terms

I observe that many GO have IC values of Inf. Can you explain if this is because the GO is not used for this species, or are there some other reasons that the IC is Inf ?

Thanks.

qibaiqi commented 5 years ago

Dear datduong, The IC value means the information content of the GO terms.It is calculated by "-log(pt/P)" ,which "pt" equals the number of genes annotated with that term or any of its descendant terms,and "P" equals the total number of gene annotations in that aspect of the ontology. When "lnf"shows, it means there is no annotation for this term,in other words,there is no gene from your database related to this term("-log(0/P)")

datduong commented 5 years ago

Thanks.