bnosac / udpipe

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
https://bnosac.github.io/udpipe/en
Mozilla Public License 2.0
209 stars 33 forks source link

does not detect polarity negators words #73

Closed PeteTat closed 4 years ago

PeteTat commented 4 years ago

The polarity negator detects the word if it is next to term positive or negative however it does not detect if it is further away.

`library("udpipe")

dmodel_english <- udpipe_load_model(file = "./english-gum-ud-2.3-181115.udpipe")

x <- c("i cannot understand when people say that the noise-cancellation on these is far superior")

anno <- udpipe(x, dmodel_english)

anno <- data.frame(anno)

scores <- txt_sentiment(x = anno, term = "lemma", polarity_terms = data.frame(term = c("annoy", "like", "painful", "superior"), polarity = c(-1, 1, -1, 1)), polarity_negators = c("not", "neither", "cannot"), polarity_amplifiers = c("pretty", "many", "really", "whatsoever"), polarity_deamplifiers = c("slightly", "somewhat"), constrain = TRUE, n_before = 15, n_after = 15, amplifier_weight = .8) scores$overall`

jwijffels commented 4 years ago

That's funny. I was disturbed a bit first but it does detect it further away. It is because you work on the lemma and cannothad as lemma can which was not part of your negator list

library(udpipe)
x <- "i cannot understand when people say that the noise-cancellation on these is far superior"
anno <- udpipe(x, "english-gum", udpipe_model_repo = "jwijffels/udpipe.models.ud.2.3")

scores <- txt_sentiment(x = anno,
                        term = "lemma",
                        polarity_terms = data.frame(term = c("annoy", "like", "painful", "superior"), 
                                                    polarity = c(-1, 1, -1, 1)),
                        polarity_negators = c("can"),
                        polarity_amplifiers = c("pretty", "many", "really", "whatsoever"),
                        polarity_deamplifiers = c("slightly", "somewhat"),
                        constrain = TRUE, n_before = 20,
                        n_after = 20, amplifier_weight = .8)
View(scores$data)

                token              lemma  upos xpos head_token_id dep_rel deps polarity sentiment_polarity
1                   i                 us  NOUN  NNS             3   nsubj <NA>       NA                 NA
2              cannot                can   AUX   MD             3     aux <NA>       NA                 NA
3          understand         understand  VERB   VB             0    root <NA>       NA                 NA
4                when               when SCONJ  WRB             6  advmod <NA>       NA                 NA
5              people             people  NOUN  NNS             6   nsubj <NA>       NA                 NA
6                 say                say  VERB  VBP             3   advcl <NA>       NA                 NA
7                that               that SCONJ   IN            14    mark <NA>       NA                 NA
8                 the                the   DET   DT             9     det <NA>       NA                 NA
9  noise-cancellation noise-cancellation  NOUN   NN            14   nsubj <NA>       NA                 NA
10                 on                 on   ADP   IN            11    case <NA>       NA                 NA
11              these              these  PRON   DT             9    nmod <NA>       NA                 NA
12                 is                 be   AUX  VBZ            14     cop <NA>       NA                 NA
13                far                far   ADV   RB            14  advmod <NA>       NA                 NA
14           superior           superior   ADJ   JJ             6   ccomp <NA>        1                 -1
PeteTat commented 4 years ago

face palm!!! You are absolutely correct ("User Error"). Confirmed that I need to switch to "token". Thank you for the quick respond.

Note: I've been using this tool for a few years now. It's one of the best out there. The Java base lib use to take hours to complete the run. This tool does it in a fraction of the time. udpipe + shiny = godmode . Thank you for providing this tool.

jwijffels commented 4 years ago

Good to hear that!