explosion / healthsea

Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.
MIT License
88 stars 16 forks source link

Case with 2 conditions mix reviews #1

Closed onlyrohits closed 2 years ago

onlyrohits commented 2 years ago

Sample input: Yam is great for joint pain but bad for liver disease Code below:

`import spacy

nlp = spacy.load("en_healthsea")
doc = nlp("Yam is great for joint pain but bad for liver disease")

print(doc._.clauses)
print(doc._.health_effects)`

Output below:

[{'split_indices': (0, 11), 'has_ent': True, 'ent_indices': (4, 6), 'blinder': '_CONDITION_', 'ent_name': 'joint pain', 'cats': {'POSITIVE': 0.9715917110443115, 'NEUTRAL': 0.004003751091659069, 'NEGATIVE': 0.01367472019046545, 'ANAMNESIS': 0.010729866102337837}, 'prediction_text': ['Yam', 'is', 'great', 'for', '_CONDITION_', 'but', 'bad', 'for', 'liver', 'disease']}, {'split_indices': (0, 11), 'has_ent': True, 'ent_indices': (9, 11), 'blinder': '_CONDITION_', 'ent_name': 'liver disease', 'cats': {'POSITIVE': 0.9650901556015015, 'NEUTRAL': 0.004530671052634716, 'NEGATIVE': 0.023644359782338142, 'ANAMNESIS': 0.0067347269505262375}, 'prediction_text': ['Yam', 'is', 'great', 'for', 'joint', 'pain', 'but', 'bad', 'for', '_CONDITION_']}]

Expected 1 positive and another negative. Actually found 2 positive sentiments.

thomashacker commented 2 years ago

That's a great example! I agree, atm, it's a tricky sentence for the pipeline to analyze.

The prediction_text attribute shows what the Text Classifier will receive as an input and we can see that it includes the whole sentence. This means that the Classifier can only rely on the blinding targeting to figure out which effect belongs to which health aspect.

If you change the sentence to something like this: "Yam is great for joint pain but it's bad for liver disease." it would correctly classify the two health effects, because the pipeline can now split the sentence into two clauses.

The reason why it now splits, is because we added a subject to the other clauses, making it splittable. The splitting relies on the Benepar parser and you can try out the demo here: https://parser.kitaev.io/

You can also write example reviews directly to the Healthsea demo

thomashacker commented 2 years ago

It think it'd be great to have this issue converted into a discussion. 🎉