g-simmons / persona-research-internship

1 stars 0 forks source link

Ques: How to fix white bars in hierarchy heatmaps for OBOFoundry ontologies? #145

Open ranr9131 opened 1 month ago

ranr9131 commented 1 month ago

Rows of 0's appear when computing the cosine similarity heatmap between child-parent vector differences. This is because for various synsets, the child and parent both have the same set of lemmas (pronto synonyms). This results in the same vector representation and a difference of 0. Then, when dividing by its norm(norm of a zero vector is zero), we get nan's and the white bars.

g-simmons commented 1 month ago

Not sure if applicable here, but one hack way to resolve division by zero issues is to add a small value to the denominator.

I haven't thought through how this would affect interpretation of the plots.

If we do this, we should report it in the Methods of any resultant publications

g-simmons commented 1 month ago

@ranr9131 Can we try merging the parent and child synsets if they have identical lemmas?

If we go this route, consider a parent P, and two children A and B. If P and A have identical sets of lemmas, but P not.eq B, then P and A should be merged. B should remain a child of P, and all children of A (if any) should become children of P.

g-simmons commented 1 month ago

@ranr9131 you also suggested shuffling the words in the synset -- apparently results differ slightly if shuffling is applied.

Personally this gives me spooky feelings, and I wouldn't like to rely on it :) But if you have a strong intuition to try it, go ahead :)