VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.
MIT License
4.43k
stars
1k
forks
source link
Fixed emoji to text conversion for emoji not surrounded by whitespace #57
Since the current method splits up the text into tokens by whitespace, it won't recognize multiple emoji in a row without whitespace, ie "😀😀😀" isn't given any meaning since the exact string "😀😀😀" isn't in the emoji lexicon, when it should probably have the same meaning as "😀 😀 😀". By checking for emoji on a character by character basis should fix this. Example output after the fix:
>>> SIA.polarity_scores("💋")
//(Interpreted as "kiss mark")
{'neg': 0.0, 'neu': 0.263, 'pos': 0.737, 'compound': 0.4215}
>>> SIA.polarity_scores("💋💋💋")
//(Interpreted as "kiss mark kiss mark kiss mark")
{'neg': 0.0, 'neu': 0.263, 'pos': 0.737, 'compound': 0.8126}
The compound score goes up as expected for three emoji in a row
Fixes this issue: Not predicting sentiment of emoticons correctly #56
Since the current method splits up the text into tokens by whitespace, it won't recognize multiple emoji in a row without whitespace, ie "😀😀😀" isn't given any meaning since the exact string "😀😀😀" isn't in the emoji lexicon, when it should probably have the same meaning as "😀 😀 😀". By checking for emoji on a character by character basis should fix this. Example output after the fix:
The compound score goes up as expected for three emoji in a row