Closed Jean-Baptiste-Camps closed 8 months ago
Stay with nltk.wordpunct_tokenize or switch to nltk.word_tokenize ?
nltk.wordpunct_tokenize
nltk.word_tokenize
cf. https://stackoverflow.com/questions/50240029/nltk-wordpunct-tokenize-vs-word-tokenize
I'll vote to keep the low-tech one (cf. the error on 'Hey in the example)
'Hey
Stay with
nltk.wordpunct_tokenize
or switch tonltk.word_tokenize
?cf. https://stackoverflow.com/questions/50240029/nltk-wordpunct-tokenize-vs-word-tokenize