Liebeck / spacy-sentiws

German sentiment scores with SentiWS as extension for spaCy
MIT License
36 stars 6 forks source link

Why is the german word "stark" always recognized as an ADV without a sentiment value? #8

Open Diapolo opened 6 months ago

Diapolo commented 6 months ago

Hey @Liebeck,

Thanks for this extension, I'm currently digging into using spaCy for a university project working on a sentiment analysis. I just made my first steps and have a working spaCy installation using 'de_core_news_lg' as model and 'spacy_sentiws' as an extension.

See this sample: "Ich bin stark. Die Digitalisierung begegnet uns überall – und hat die Art, wie wir arbeiten und leben, stark verändert."

Ich, None, PRON bin, None, AUX stark, None, ADV ., None, PUNCT Die, None, DET Digitalisierung, None, NOUN begegnet, None, VERB uns, None, PRON überall, None, ADV –, None, PUNCT und, None, CCONJ hat, None, VERB die, None, DET Art, None, NOUN ,, None, PUNCT wie, None, SCONJ wir, None, PRON arbeiten, None, VERB und, None, CCONJ leben, None, VERB ,, None, PUNCT stark, None, ADV verändert, None, VERB ., None, PUNCT

Why is "stark" always recognized as an adverb and why doesn't it get a SentiWS value at all? If I look into it should get a value of 0.0040 (stark|ADJX 0.0040).

Thanks, Dia

Liebeck commented 6 months ago

@Diapolo POS tagging is done through spaCy, not through my SentiWS wrapper. Therefore, "stark" with the POS tag ADV does not have any entry in SentiWS. Have a look at the implementation https://github.com/Liebeck/spacy-sentiws/blob/master/spacy_sentiws/senti_ws_wrapper.py#L25

Diapolo commented 6 months ago

Thanks, I also had a look into that file, so it seems spaCy or it's german training data isn't accurate here? Would you call this a "bug"?