YuLab-SMU / GOSemSim

:golf: GO-terms Semantic Similarity Measures
https://yulab-smu.top/biomedical-knowledge-mining-book/
58 stars 26 forks source link

Information Content/goSim could be calculated for more GO terms? #33

Open dorrenasun opened 3 years ago

dorrenasun commented 3 years ago

Dear authors,

I am trying to calculate semantic similarity between some GO terms based on the information content methods.

# GO.db retrieved by 2020-09-10, Bioconductor version 3.12
atGO<-godata('org.At.tair.db', keytype="TAIR",ont="BP")
goSim("GO:0120254", "GO:0120255", semData=atGO, measure="Jiang") # NA

The value returns NA as these GO terms are not directly annotated for Arabidopsis. However, their descendant terms were actually annotated to some genes in the database (GO.db retrieved by 2020-09-10, Bioconductor version 3.12):

length(GOBPOFFSPRING[["GO:0120254"]] %in% keys(org.At.tair.db,"GO"))  # 196
length(GOBPOFFSPRING[["GO:0120255"]] %in% keys(org.At.tair.db,"GO"))  # 97

and thus their IC values were actually feasible for calculation. Do you think it is possible to include such GO terms in the current IC & goSim() calculation?

Thank you very much.

dorrenasun commented 3 years ago

Dear Authors,

I figured out why is this problem: this is a conflict between the pre-calculated "gotbl.rda" in the data folder and the local version of GO.db. It seems that the function prepare_relation_df() in the source code is not executed locally during installation, so gotbl.rda remains as the same as when downloaded with the package. It would be really nice if you have any plan to fix it.

Thank you!