en-wl / wordlist

SCOWL (and friends).
http://wordlist.aspell.net
373 stars 87 forks source link

slurs #356

Open ConorSheehan1 opened 1 year ago

ConorSheehan1 commented 1 year ago

I didn't realize these word lists contained slurs. I used https://github.com/en-wl/wordlist/blob/master/alt12dicts/2of4brif.txt for a spelling game, and got an email from a user, rightfully annoyed that internet was not a valid word, but the n-word was!

I think it'd really help to have a clear warning in the README indicating which files have slurs in them. Also having a 'clean' version of each file would be very useful in my opinion. I can see a variety of usecases for clean versions, e.g. spell-checkers, auto-correct/completers, games, etc.

Thanks for providing the lists! They are very useful, but I feel this is a potentially dangerous gotcha

matkoniecz commented 1 year ago

Note that readme mentions

SCOWL (Spell Checker Oriented Word Lists) is a database of English words that can be used to create word lists suitable for use in spell checkers of various sizes and dialects

and does not mention whatsoever that slurs or other categories of words are excluded, so it seems clear that various ugly words will be also present there.

Also having a 'clean' version of each file would be very useful in my opinion.

Note also that depending on context different words can be treated as slurs or not, and it is a political problem in many cases. It may be better as a separate project, if someone is interested in running it.

See https://github.com/en-wl/wordlist/issues/345 for a specific case (where someone basically argues that "Kiev" is a slur).