JherezTaylor / hatespeech_codewords

A contextual approach for detecting hate speech code words
MIT License
9 stars 3 forks source link

Bootstrapping secondary code words #117

Closed JherezTaylor closed 7 years ago

JherezTaylor commented 7 years ago

office lens 20170622-113422

For words that pass requirement 1 of #116 but not requirement 2 we do the following:

  1. Check if these words share some correlation with the cw list obtained in #116 (relatedness and similarity)
  2. If there are matches above a support threshold then we flag these words as secondary cw
  3. The idea here is to keep bootstrapping and expanding the list of cw

We may find that the expanded HS list contains patterns that mix with functional words to create a HS inference.

JherezTaylor commented 7 years ago

Resolved in #122