subjectivity scores - Githubissues

rxng commented 7 years ago

Most sentiment analysers have subjective scores e.g. textblob is there a way to incorporate this into vader sentiment? what about training new data?

Hiestaa commented 7 years ago

I am not involved in any of the researches that lead to this analyzer, but I've played quite a bit with it.

Vader returns the sentiment value as 3 floating points value in range 0 <= score <= 1, plus one additional compound value that mixes all three in one value in range -1 <= score <= 1. From the README:

The compound score is computed by summing the valence scores of each word in the lexicon, adjusted according to the rules, and then normalized to be between -1 (most extreme negative) and +1 (most extreme positive). This is the most useful metric if you want a single unidimensional measure of sentiment for a given sentence. Calling it a 'normalized, weighted composite score' is accurate.

It is also useful for researchers who would like to set standardized thresholds for classifying sentences as either positive, neutral, or negative. Typical threshold values (used in the literature cited on this page) are:

positive sentiment: compound score >= 0.5

neutral sentiment: (compound score > -0.5) and (compound score < 0.5)

negative sentiment: compound score <= -0.5

The pos, neu, and neg scores are ratios for proportions of text that fall in each category (so these should all add up to be 1... or close to it with float operation). These are the most useful metrics if you want multidimensional measures of sentiment for a given sentence.

Based on this, I can tell that a strongly subjective sentence will have a compound score close to 0, and a neu score close to one. From my reading of the code, the neu value increases as the proportion of words that match a subjective token from their lexicon decreases.

TL;DR: use the inverse of the neu field of the returned sentiment value (1 - neu). When neu = 1, then subjectivity score is null (0). - when neu ~= 0 then you have the highest subjectivity score. Alternatively, you can use the inverse of the absolute value of the compound score (1 - abs(compound)).

rxng commented 7 years ago

@Hiestaa how do you even figure out something like that?

Hiestaa commented 7 years ago

@rxng I've been working on the same idea than these guys (rule-based sentiment analysis in short texts), with much less time, effort and resources. When I found this, it was obviously working much better than whatever I managed to come up with.

I've spent some time replacing my stuff with theirs, thus studied the code and understood the inner workings of this project (as required to maintain this code in my own project). I've also extended it with debug information to get a better sense of the effect of each rule described here.

Really, just look at their code, try to run it locally and fire a step-by-step debugger. Add logging statements. It's by far not as complicated as other deep-learning approaches to sentiment analysis that appear to be more hyped by researchers these days.

rxng commented 7 years ago

Cool! Thanks @hiestaa it takes me a long time to understand the code and figure it out ( I am still a python newbie learning). Did try to tinker abit but didn't quite work out well.; Because I found it interesting that none of the sentiment analysis libraries could detect "limited staff support" as a negative sentence

cjhutto / vaderSentiment

subjectivity scores #26