VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.
Special case huge performance regression in 3.3.1+ #110

dandelionred commented 4 years ago

I've been processing random comments from social media and noticed some strange spikes in processing time logs. Generally it takes less than a second to process a chunk of data. But here on the plot you can see a dot approaching 10 mins!

Screenshot from 2020-08-18 23:53:48

I traced the slow-down back to vader 3.3.1+ 100% cpu usage on texts with huge amount of emoticons.

Test script vader.py

#!/usr/bin/env python3

import sys
import json

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

si = SentimentIntensityAnalyzer()

for line in sys.stdin:
    line = json.loads(line)
    print(json.dumps(si.polarity_scores(line), sort_keys=True))

Sample input slow.json https://pastebin.com/nxjLSTMQ

vader 3.2.1:

$ time ./vader.py < slow.json 
{"compound": 0.8955, "neg": 0.027, "neu": 0.913, "pos": 0.06}

real    0m0.182s
user    0m0.168s
sys 0m0.008s

vader 3.3.1+:

$ time ./vader.py < slow.json 
{"compound": 1.0, "neg": 0.218, "neu": 0.345, "pos": 0.437}

real    0m50.914s
user    0m48.588s
sys 0m2.328s

The input sample is not an artificial joke btw. Here are samples of what supposedly real people post on reddit and I feed vader with stuff like that: