cjhutto / vaderSentiment

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.
MIT License
4.38k stars 1k forks source link

TypeError after Updating Lexicon #109

Closed yuliarielng closed 4 years ago

yuliarielng commented 4 years ago
  309                     if start_i == 2 and s != 0:
    310                         s = s * 0.9
--> 311                     valence = valence + s
    312                     valence = self._negation_check(valence, words_and_emoticons, start_i, i)
    313                     if start_i == 2:

TypeError: can only concatenate str (not "float") to st

Hi, I ran into the above error after updating the lexicon with additional words from senticnet. Could I check if there's any hint on how to resolve this? I'm trying to check the senticnet codes line by line. Thank you so so much.

cjhutto commented 4 years ago

So a few things to investigate:

  1. when you update the lexicon, be sure the updated vader_lexicon.txt file follows the proper file format (which is actually a .tsv [tab separated file] saved as a generic .txt... NOTE: The current algorithm makes immediate use of the first two elements: token and mean valence that are separated by a tab character).

  2. tokens within the vader_lexicon.txt file currently are only single item tokens (e.g., only one word or emoticon without any white spaces) -- so if you are attempting to update the file by adding phrases (multi-word tokens), it won't work in the lexicon file (but you could potentially put those sentiment laden phrases in either the SENTIMENT_LADEN_IDIOMSdict or the SPECIAL_CASESdict (see vaderSentiment.py lines 70-80)

yuliarielng commented 4 years ago

Hi there, thanks for the very quick response.

I didn't update the vader_lexicon by directly updating the txt file though, I did it by using the update function on the lexicon dictionary. I ran (quite a long, sorry novice coder!!) loop to check the senticnet dictionary, and realised that only some words caused the problems. The problematic words were quite random, such as able, acne, accomplish. They also change according to the dataset which I want to do sentiment analysis on. Would be great if you have some rough ideas in this direction but I appreciate the help that you have given already.

I'll re-examine the codes/problematic bits again, will update if I find a fix. Thanks so much!

%run SenticSG.py

error_words=[]
current_word=""
prev_word=""
for k in senticnet.keys():
    try:
        vader_intensity={} #clear the dictionary after each run
        vader_intensity[k]=senticnet[k][7]
        analyser.lexicon.update(vader_intensity)

        #training
        train['compound'] = [analyser.polarity_scores(v)['compound'] for v in train['text']]
        print(k, 'is okay!')

        #add previous word into placeholder
        prev_word=k
    except: 
        analyser.lexicon.pop(k) # pop the current word
        try:
            analyser.lexicon.pop(prev_word) # pop the previous word that's causing trouble as well
        except:
            pass
        error_words.append(k)   #keep a list of error words 
        print(k, 'is not ok!')

An example chunk of the senticnet code in the 'SenticSG.py' file is as follows:

senticnet['able'] = ['0.856', '0', '0', '0.946', '#shiok', '#suka', 'positive', '0.601', 'praise', 'relican', 'capcan', 'smart', 'can']
senticnet['abashed'] = ['-0.941', '0', '0', '-0.925', '#sim_tia', '#sian', 'negative', '-0.622', 'embarrassed', 'humiliated', 'shamed', 'loss_of_face', 'no_face']
senticnet['abbb'] = ['0', '0', '0.987', '-0.746', '#pek_chek', '#sian', 'negative', '-0.578', 'act_big_buay_big', 'act_big', 'pejorative', 'repulsive', 'pretentious']
senticnet['abide'] = ['0.997', '0', '0', '0.761', '#shiok', '#suka', 'positive', '0.586', 'understand', 'accept', 'acknowledge', 'comprehend', 'understooded']
yuliarielng commented 4 years ago

Hi Hutto,

Just figured that the way of adding via code to dictionary is not a great idea, i added on the additional words via the vader_lexicon.txt file by copying into the document (paying attention to your tip) and it works perfectly! Thanks for your help.