Closed igormis closed 3 years ago
wordpunct_tokenizer
which gets used if a word tokenizer is not specified.>>> import nltk
>>> nltk.tokenize.wordpunct_tokenize('S&P')
['S', '&', 'P']
You can either use other tokenizers nltk
provides (TweetTokenizer?) or provide a tokenizer of your own and you should get the results you require.
Hi, I am trying to extract key phrases in a sentence and it works quite good. However when trying to decompose this sentence: S&P stocks are falling, whereas Google is struggling The model is splitting the sentence into 2 clause. However in the first clause it adds space before and after the &, like S & P. which makes problems in the following step of my algorithm (entity recognition). The code for initialization of rake is the following: