fhamborg / NewsMTSC

Target-dependent sentiment classification in news articles reporting on political events. Includes a high-quality data set of over 11k sentences and a state-of-the-art classification model.
Other
140 stars 21 forks source link

Strip target text of leading and trailing whitespace #40

Open vestedinterests opened 4 weeks ago

vestedinterests commented 4 weeks ago

Great package, and I think it offers a very easy-to-use implementation for targeted sentiment classification! I had a brief question: it took me a while to track down that I think the package fails for me always whenever the target has a leading or trailing white space? I know that it is recommended that the whitespace should be in the left- and right-contexts, but maybe a simple .strip() could prevent a

from NewsSentiment import TargetSentimentClassifier
tsc = TargetSentimentClassifier()

This will work fine:

tsc.infer(
    "The context ",
    "around a target",
    "matters."
)

but either of the following will produce an error message:

tsc.infer(
    "The context ",
    "around a target ",
    "matters."
)
tsc.infer(
    "The context ",
    " around a target",
    "matters."
)

It fails then with

UnboundLocalError: local variable 'text' referenced before assignment

which took me a while to debug.

I would suggest that it's either a simple check which raises an Error, but informs the user that a whitespace should not in the target word, or if you think it's not an issue, to perform the whitespace stripping yourself. I could suggest a simple commit, but I didn't know where to best place it, or what your opinion would be on the two possible approaches.

(tested on Python 3.10, NewsSentiment 1.2.28)

fhamborg commented 4 weeks ago

Hi, thanks for the issue report. Do you happen to have the full strack trace available of when executing one of the failing tsc.infer variants?

vestedinterests commented 3 weeks ago

Does this help already? Or do you need more detail?

Traceback (most recent call last):
  File "/Users/marvinstecker/Development/paper1/bug_report_news_sentiment.py", line 8, in <module>
    tsc.infer("The context ", "around a target ", "matters.")
  File "/Users/marvinstecker/Development/paper1/venv310/lib/python3.10/site-packages/NewsSentiment/infer.py", line 217, in infer
    out.extend(self.batch_infer(targets[batch_start:batch_end]))
  File "/Users/marvinstecker/Development/paper1/venv310/lib/python3.10/site-packages/NewsSentiment/infer.py", line 268, in batch_infer
    ), f"target_mention={target_mention}; text={text}"
UnboundLocalError: local variable 'text' referenced before assignment