Open alex-lairan opened 4 years ago
Seems like a problem with the tokenizer. I'll look into it.
Using the pragmatic tokenizer the token don't
is recognized, but I think there's a problem with the negation identification which I addressed in cadmiumcr/cadmium#27.
sentiment.tokenizer = Cadmium.pragmatic_tokenizer.new
{score: 2,
comparative: 0.4,
tokens: ["i", "realy", "don't", "like", "mosquitoes"],
words: ["like"],
positive: ["like"],
negative: []}
false
The problem with the Pragmatic Tokenizer is that it's much much slower than the other ones. I do not recommend using it internally for anything.
@watzon it also works with aggressive_tokenizer
, but the behavior varies a lot depending on the tokenizer.
Yeah the agressive_tokenizer
would probably be the one to use
@watzon : Can we move this issue to cadmiumcr/sentiment repo ? It makes more sense :-)
Yes, it should definitely be moved
Hi,
I use sentiment analysis for testing purposes, and I found something with composed words.
I have this code :
The result is :
Here, the
don't
is not followed. I know is a bad English, but it's something you can found on twitter.I don't know if I'm using it in a wrong way.