JherezTaylor / hatespeech_codewords

A contextual approach for detecting hate speech code words
MIT License
9 stars 3 forks source link

sim353 distance check #90

Closed JherezTaylor closed 7 years ago

JherezTaylor commented 7 years ago

Get the distance of the word pairs in the wordsim353 dataset using WordNet. Check if the distance matches up with the variation of the individual annotator scores in sim353 in an effort to see if we can measure similarity or dissimilarity in that what. This should inform if any domain is being reflected as well.

An initial pass of sim353 shows that most of the variation in the scores come down to subjective opinion on things like:

JherezTaylor commented 7 years ago

Look into this - https://github.com/alvations/pywsd/blob/master/pywsd/similarity.py#L76

JherezTaylor commented 7 years ago

http://www.nltk.org/howto/wordnet.html http://stackoverflow.com/questions/22031968/how-to-find-distance-between-two-synset-using-python-nltk-in-wordnet-hierarchy

JherezTaylor commented 7 years ago

Closing, I don't think I'll use this anymore.