Bootstrapping secondary code words - Githubissues

JherezTaylor / hatespeech_codewords

A contextual approach for detecting hate speech code words

MIT License

9 stars 3 forks source link

Bootstrapping secondary code words #117

Closed JherezTaylor closed 7 years ago

JherezTaylor commented 7 years ago

office lens 20170622-113422

For words that pass requirement 1 of #116 but not requirement 2 we do the following:

Check if these words share some correlation with the cw list obtained in #116 (relatedness and similarity)
If there are matches above a support threshold then we flag these words as secondary cw
The idea here is to keep bootstrapping and expanding the list of cw

[] Keep track of words that pass req 1 only
[] Define function to check for correlation with the words in cw
[] Define support threshold

We may find that the expanded HS list contains patterns that mix with functional words to create a HS inference.

JherezTaylor commented 7 years ago

Resolved in #122