cjhutto / vaderSentiment

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.
MIT License
4.43k stars 1k forks source link

Fixed emoji to text conversion for emoji not surrounded by whitespace #57

Closed ckw017 closed 5 years ago

ckw017 commented 6 years ago

Fixes this issue: Not predicting sentiment of emoticons correctly #56

Since the current method splits up the text into tokens by whitespace, it won't recognize multiple emoji in a row without whitespace, ie "😀😀😀" isn't given any meaning since the exact string "😀😀😀" isn't in the emoji lexicon, when it should probably have the same meaning as "😀 😀 😀". By checking for emoji on a character by character basis should fix this. Example output after the fix:

>>> SIA.polarity_scores("💋")
//(Interpreted as "kiss mark")
{'neg': 0.0, 'neu': 0.263, 'pos': 0.737, 'compound': 0.4215}
>>> SIA.polarity_scores("💋💋💋")
//(Interpreted as "kiss mark kiss mark kiss mark")
{'neg': 0.0, 'neu': 0.263, 'pos': 0.737, 'compound': 0.8126}

The compound score goes up as expected for three emoji in a row