bnosac / udpipe

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
https://bnosac.github.io/udpipe/en
Mozilla Public License 2.0
209 stars 33 forks source link

Error in keywords_rake: length(relevant) == nrow(x) is not TRUE #42

Closed athammad closed 5 years ago

athammad commented 5 years ago

Hello,

I am using the function keywords_rake but it keeps throwing the following error. What does it mean?

UDtext <-as.data.table(udpipe_annotate(tagger, sometext$text))
kws <- keywords_rake(UDtext, term = "lemma", group = "doc_id", 
                          relevant = x$xpos %in% c("NN", "JJ"))

Here is the error:

Error in keywords_rake(UDtext, term = "lemma", group = "doc_id", relevant = x$xpos %in%  : 
  length(relevant) == nrow(x) is not TRUE

Is there anyway to fix it? Thank you :)

jwijffels commented 5 years ago

Look to the documentation of keywords_rake: length(relevant) should be the same as nrow(x)

athammad commented 5 years ago

Sorry, I have checked the documentation but the two objects have the same value. Could it be related with the small number of documents?

jwijffels commented 5 years ago

No, they dont have the same value, thats exactly what is checked in the function. Please read the documentation of the function

athammad commented 5 years ago

Dear, I have read the documentation and below you can find an example.... I must be checking the value in the wrong way but that's what I get.

library(udpipe)

tagger<- udpipe_download_model(language = "english",overwrite=FALSE)
tagger<-udpipe_load_model(tagger)

ex<-c("Layin n bed with a headache  ughhhh...waitin on your call...",
"Funeral ceremony...gloomy friday...",
"wants to hang out with friends SOON!   ",
"@dannycastillo We want to trade with someone who has Houston tickets, but no one will.",
"Re-pinging @ghostridah14: why didn't you go to prom? BC my bf didn't like my friends   ",
"I should be sleep, but im not! thinking about an old friend who I want. but he's married now. damn, &amp; he wants me 2! scandalous!",
"Hmmm. http://www.djhero.com/ is down",
"Charlene my love. I miss you",
"I'm sorry  at least it's Friday?",
"cant fall asleep",
"Choked on her retainers",
"Ugh! I have to beat this stupid song to get to the next  rude!",
"if u watch the hills in london u will realise what tourture it is because were weeks and weeks late  i just watch itonlinelol",
"Got the news",
"The storm is here and the electricity is gone",
"annarosekerr agreed",
"So sleepy again and it's not even that late. I fail once again.",
"@PerezHilton lady gaga tweeted about not being impressed by her video leaking just so you know",
"How are YOU convinced that I have always wanted you? What signals did I give off...damn I think I just lost another friend",
"oh too bad! I hope it gets better. I've been having sleep issues lately too")

UDtext <-as.data.table(udpipe_annotate(tagger, ex))

kws <- keywords_rake(UDtext, term = "lemma", group = "doc_id", 
                     relevant = x$xpos %in% c("NN", "JJ"))

nrow(UDtext)
length(UDtext$xpos %in% c("NN", "JJ"))

Thank you

jwijffels commented 5 years ago

Where is x?

athammad commented 5 years ago

Thank you for your patient! Now I see my mistake!!