cjhutto / vaderSentiment

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.
MIT License
4.38k stars 1k forks source link

Nondeterministic behavior: same input, different vader scores #108

Closed dandelionred closed 4 years ago

dandelionred commented 4 years ago

Test script test.py (read json strings from stdin and dump vader scores as json objects to stdout):

#!/usr/bin/env python3

import sys
import json

from nltk.sentiment.vader import SentimentIntensityAnalyzer

si = SentimentIntensityAnalyzer()

for line in sys.stdin:
    line = json.loads(line)
    print(json.dumps(si.polarity_scores(line), sort_keys=True))
    sys.stdout.flush()

Sample input sample.json (source: reddit)

"We've got some studs prospects, but thats all they are, prospects. BUT FUCK WE WON THE LOTTERY"

Test runs:

$ for ((i=0;i<100;i++)); do ./test.py < sample.json; done | sort | uniq -c
     59 {"compound": 0.3612, "neg": 0.207, "neu": 0.461, "pos": 0.332}
     41 {"compound": 0.5719, "neg": 0.199, "neu": 0.442, "pos": 0.359}
$ for ((i=0;i<100;i++)); do ./test.py < sample.json; done | sort | uniq -c
     48 {"compound": 0.3612, "neg": 0.207, "neu": 0.461, "pos": 0.332}
     52 {"compound": 0.5719, "neg": 0.199, "neu": 0.442, "pos": 0.359}

I run the tests under virtualenv after this:

Some more sample json strings (source: reddit):

"Not to be debbydowner, but I sort of agree with you. As a child of narcissistic parents I can only imagine the dumpster fire of memories I would have had to read and see. \n\nBUT, I have friends with kids who are absolutely wonderful parents. I've sat with them after their kids have gone to bed and they've recounted memories of the day. So maybe in those situations sending a little message to document a memory in a quick note would be sweet."
"Yeahhh... so the droopy off the shoulder straps were fashionable for a while. BUT they are always made extra large (like, by several inches) so the lady can have them tailored to her body with minimal fuss. (You can always cut away at something, but adding is difficult.) \n\nBy the look of things, this woman just didn't want to pay the $50 to have the straps fitted at all.\n\n(Source: I work in a bridal store and also sew as a hobby. I have paid close attention to out seamstresses.)"
"I am playing one and it's very good, imho. The bonus spell slots are enough to keep with necessary heals in combat. You can go melee (gish) or just caster. You can take warpriest (also doable with champ MC but you don't get access to all the feats a wapriest could take at certain levels).\n\nBesides healings (and domains) , cleric can take feats vs fiends or undead. Or feats that let you heal an ally and harm an enemy with the same spell. You can play a trickster cleric and pass for a cleric of Sarenrae.\n\nFor now there's nothing that I'd want from a cleric that it can't do. BUT if you feel like it's not for you play another class, there are so many options so it will not be a problem."
"Yeah, and you only got 1/3 of his skills... I read something, maybe a rumor not sure, that One of the Skill Pages is just your Ultimate, which is SICK to make the Hulk Buster or whatever wicked awesome.. BUT, If it's just your Ultimate, we get skills wicked fast, and everyone seemingly will probably have every skill without any choice, I don't understand how they mean that Your Iron Man will Play different from My Iron Man?? The only way they could change this is if there is a choice and we can only get one thing or the other, Or if they have the options to add Multiple points into certain things, so like My Iron Man Rockets will be JACKED out, and my Lasers will be very weak, or vice Versa, you know what I mean? I just REALLY hope there is something that ACTUALLY sets us apart, and not everyone has the same skills, and the only variation is how we play, and what Armor/Loot we have!  Maybe I'm an opinion that isn't shared from everyone, but I REALLY want characters to play different! Not just this guy uses his hulk to throw rocks as ranged, and this guy charges in with his Hulk, that would kind of Bum me out.. but I do LOVE this game"
"&gt; I never even did that to mine. I think she's in her 40s from commissions. If I ever need to limit break her, it shouldn't be that hard. And if I never see her again, oh well.\n\nHeh, fair point there Comrade.\n\n&gt; \nOh yeah. Even that stupid German took me a month to get. I think German girls just don't like me.\n\n[](#toradorasalute)\n\n&gt; I have no idea when that'll be. Monarch is still working on her first combat stage. And then I may do Roon next with the EXP packs.\n\nA good plan, given those free EXP Packs really do help a lot, that and Roon is not just a German Ship but one of the best heavy cruisers to boot (and of course a totally fictional fabrication, as in at least SOME of the PR ships have some sort of historical precedent, e.g. like how Saint Louis was originally indeed planned out by the French, and Ibuki went so far as to be not just laid down BUT was in the process of being converted into... a light aircraft carrier... cuz reasons... ROON on the other hand is literally just a pipe dream thought up from someone over at Wargaming noticing a paper design to a hypothetical set of Heavy Cruiser guns that never came to be... and then they built a whole ship around these gun designs... cuz wibble)\n\nAnyway many thanks for the kind reply my friend, have a great day and see you later Comrade!"
"Try fighting a cetrion using earthquake from full screen for the whole match. I mean NOTHING BUT EARTHQUAKE. Now i understand that fujin, noob saibot and other characters have an easy way to avoid it but even Shang tsung's soul swap leaves him open because he doesn't leave the ground and avoid it, he stands there whilst it's starting up. For many characters who don't have a quick teleport or no teleport like: sub zero, his slide doesn't make him safe; frost; shape Kahn; terminator and a lot more then an earthquake spam cetrion is very hard to close in on. There may be a gap but it is hard to see and use"
dandelionred commented 4 years ago

It is nltk 3.5 vader implementation issue. I've got it narrowed down to a single line in their code.

cjhutto commented 4 years ago

It is nltk 3.5 vader implementation issue. I've got it narrowed down to a single line in their code.

Good find. Another win for open source. THANKS!

dandelionred commented 4 years ago

nltk issue https://github.com/nltk/nltk/issues/2581