gnn4dr / DRKG

A knowledge graph and a set of tools for drug repurposing
Apache License 2.0
565 stars 153 forks source link

Unreasonable to compute jaccard_scores in this way #15

Closed DeepColin closed 3 years ago

DeepColin commented 4 years ago

for example, by comparing edge bioarx::HumGenHumGen:Gene:Gene and GNBR::Pr::Compound:Disease:

e1,e2 = keys[0],keys[40]
n1_d=node_dictionary[e1]
n2_d=node_dictionary[e2]

the number in n2_d may mean compound or disease, while the same number in n1_d refers to gene. The jaccard scores between these two sets is meaningless

bioannidis commented 3 years ago

Thank you for your comment. The jaccard score for these two sets is 0 and it is meaningful. We are looking for sets with high jaccard score that have overlapping edges and in this case, there is no overlap among the sets.