SupervisedStylometry / SuperStyl

Supervised Stylometry
GNU General Public License v3.0
21 stars 5 forks source link

Which word tokenizer to use ? #65

Closed Jean-Baptiste-Camps closed 8 months ago

Jean-Baptiste-Camps commented 8 months ago

Stay with nltk.wordpunct_tokenize or switch to nltk.word_tokenize ?

cf. https://stackoverflow.com/questions/50240029/nltk-wordpunct-tokenize-vs-word-tokenize

Jean-Baptiste-Camps commented 8 months ago

I'll vote to keep the low-tech one (cf. the error on 'Hey in the example)