Fixed typo in the max of the two path similarity
The path_similarity function in wordnet is not commmutative.
As said by @michael-aloys, there was a typo in
max(wn.path_similarity(sense1,sense2), wn.path_similarity(sense1,sense2))
The 2 arguments in the max were the same, it is corrected to
max(wn.path_similarity(sense1,sense2),wn.path_similarity(sense2,sense1))
Fixed None return
When no path between two senses is found, path_similarity and wup_similarity return None. This is the correct behavior of ntlk.wordnet and should not be changed.
This is a problem because:
when None is returned by path_similarity, it directly breaks the python max described above (in python3)
when None is returned by wup_similarity, it can break the python max operation performed in the pywsd max_similarity function.
In this fix, checks on None value are added. If such case happen, the function now returns no_path_value, which is by default 0. This 0 value follows the behavior that was already implemented in the Leacock-Chodorow similarity case when the similarity was not computable.
To discuss:
I put no_path_value in argument instead of hardcoding 0 because i thought that in some cases one may want this value not to be 0. For instance: to avoid zeroing in a multiplication or dividing by 0 (by setting it to a very small float), or to put it to None to catch it after.
Also, the default value 0 of no_path_value may be discussed.
It is important to notice the behavior I implemented for the path_similarity case:
Sometimes wn.path_similarity(sense1,sense2) would return a value and wn.path_similarity(sense2,sense1) would return None. If the case happen of one path_similarity returning a number and the other returning None, the path_similarity that gave a number will be kept and returned
This pull requests aims to fix the issue https://github.com/alvations/pywsd/issues/31 described by @michael-aloys. (and more marginally https://github.com/alvations/pywsd/issues/41)
Fixed typo in the max of the two path similarity The path_similarity function in wordnet is not commmutative.
As said by @michael-aloys, there was a typo in
max(wn.path_similarity(sense1,sense2), wn.path_similarity(sense1,sense2))
The 2 arguments in the max were the same, it is corrected tomax(wn.path_similarity(sense1,sense2),wn.path_similarity(sense2,sense1))
Fixed None return When no path between two senses is found, path_similarity and wup_similarity return None. This is the correct behavior of ntlk.wordnet and should not be changed. This is a problem because:
In this fix, checks on None value are added. If such case happen, the function now returns
no_path_value
, which is by default0
. This 0 value follows the behavior that was already implemented in the Leacock-Chodorow similarity case when the similarity was not computable.To discuss:
I put
no_path_value
in argument instead of hardcoding 0 because i thought that in some cases one may want this value not to be 0. For instance: to avoid zeroing in a multiplication or dividing by 0 (by setting it to a very small float), or to put it to None to catch it after. Also, the default value0
ofno_path_value
may be discussed.It is important to notice the behavior I implemented for the path_similarity case: Sometimes
wn.path_similarity(sense1,sense2)
would return a value andwn.path_similarity(sense2,sense1)
would return None. If the case happen of one path_similarity returning a number and the other returning None, the path_similarity that gave a number will be kept and returned