Open ndvbd opened 5 years ago
I would prefer a version of the words list in text format that has all possible punctuation. For example: Wayne Wayne's carrots carrot's etc. This would be really useful for cleaning text files for use with machine learning. I want to remove the trash from text files like headers and ascii art, but I don't want to remove genuine English language.
I see that in the file words_alpha there are the (wrong) words: isnt arent wouldnt
and that these (right) words are not included: isn't aren't wouldn't
Is this intentional?