Open ranr9131 opened 1 month ago
Not sure if applicable here, but one hack way to resolve division by zero issues is to add a small value to the denominator.
I haven't thought through how this would affect interpretation of the plots.
If we do this, we should report it in the Methods of any resultant publications
@ranr9131 Can we try merging the parent and child synsets if they have identical lemmas?
If we go this route, consider a parent P, and two children A and B. If P and A have identical sets of lemmas, but P not.eq B, then P and A should be merged. B should remain a child of P, and all children of A (if any) should become children of P.
@ranr9131 you also suggested shuffling the words in the synset -- apparently results differ slightly if shuffling is applied.
Personally this gives me spooky feelings, and I wouldn't like to rely on it :) But if you have a strong intuition to try it, go ahead :)
Rows of 0's appear when computing the cosine similarity heatmap between child-parent vector differences. This is because for various synsets, the child and parent both have the same set of lemmas (pronto synonyms). This results in the same vector representation and a difference of 0. Then, when dividing by its norm(norm of a zero vector is zero), we get nan's and the white bars.
Already tried to include the synset name itself in set of lemmas: did not work (don't know why yet)
We could manually alter a set of lemmas if it matches with another synset (especially if they are child-parent). How to do this? Is this a good solution?