code-kern-ai / refinery

The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.
https://www.kern.ai
Apache License 2.0
1.39k stars 66 forks source link

[BUG] - spacy Doc sentiment is always 0.0 #103

Closed DerEchteFeuerpfeil closed 1 year ago

DerEchteFeuerpfeil commented 1 year ago

Describe the bug The spacy Doc sentiment is always 0.0 for the attributes of my records.

To Reproduce Steps to reproduce the behavior:

  1. Go to a project of yours
  2. Make a labeling function printing the sentiment of one of your attributes, e.g. "print(record["headline"].sentiment)" in AG_News
  3. Observe the sentiment output, which is always 0.0

Expected behavior The sentiment should reflect the true sentiment of the attribute, which is NOT always 0.0

Screenshots image

Desktop (please complete the following information):

Additional context

jhoetter commented 1 year ago

I need to look up the source for this, but if I remember correctly, the md (medium) tokenizers don't contain a sentiment pipeline. We can offer to also integrate lg models, which have both a higher precision in their entity recognition and sentiment.

Again, this is off the top of my head, I need to research this again to be 100% sure

JWittmeyer commented 1 year ago

Sentiment isn't a field that is automatically filled by spacy