aboSamoor / polyglot

Multilingual text (NLP) processing toolkit
http://polyglot-nlp.com
Other
2.31k stars 337 forks source link

IndexError when accessing sentiment on entities #82

Open ghost opened 7 years ago

ghost commented 7 years ago

Another bug in Sentiment:

<ipython-input-6-7e54515ff7d3> in per_word_sentiment(T)
      3    for w in T.entities:
      4         wds = ' '.join(chunk_words(w))
      5         if not wds: continue
      6         try:
----> 7             pos = w.positive_sentiment
      8             neg = w.negative_sentiment
      9         except IndexError as IE:

/usr/local/lib/python3.5/dist-packages/polyglot/decorators.py in __get__(self, obj, cls)
     18     if obj is None:
     19         return self
---> 20     value = obj.__dict__[self.func.__name__] = self.func(obj)
     21     return value
     22 

/usr/local/lib/python3.5/dist-packages/polyglot/text.py in positive_sentiment(self)
    427   def positive_sentiment(self):
    428     """Positive sentiment of the entity."""
--> 429     pos, neg = self._sentiment()
    430     return pos
    431 

/usr/local/lib/python3.5/dist-packages/polyglot/text.py in _sentiment(self, distance)
    449     else:
    450       polarities = np.array([w.polarity for w in text.words])
--> 451       polarized_positions = np.argwhere(polarities != 0)[0]
    452       polarized_non_entity_positions = non_entity_positions.intersection(polarized_positions)
    453       sentence_len = len(text.words)

IndexError: index 0 is out of bounds for axis 0 with size 0
ghost commented 7 years ago

The above bug appears to arise on short texts. It assumes there will always be something to index in np.argwhere(polarities != 0) when in fact there is not.