bnosac / textrank

Summarise text by finding relevant sentences and keywords using the Textrank algorithm
76 stars 9 forks source link

textrank_sentences #4

Closed vmasias closed 6 years ago

vmasias commented 6 years ago

Dear team,

Im using textrank_sentences for ranking tweets (n =5,000). But I get the following error.

xx <- rtllFILTRADO3[!duplicated(rtllFILTRADO3$text), ]

sentences <- unique(xx&text[, c("sentence_id", "sentence")]) cat(sentences$sentence)

terminology <- subset(xx, upos %in% c("NOUN", "ADJ"))

terminology <- terminology[, c("sentence_id", "lemma")] head(terminology)

Textrank for finding the most relevant sentences

tr <- textrank_sentences(data = sentences, terminology = terminology) names(tr)

Error in textrank_sentences(data = sentences, terminology = terminology) : sum(duplicated(data[, 1])) == 0 is not TRUE

What Im doing wrong?

Best, VHM

jwijffels commented 6 years ago

Same question as here: https://github.com/bnosac/textrank/issues/1 the solution is indicated there. Make sure the first column in textrank_sentences(data = sentences, ...) contains unique sentence identifiers. Did you get the sentence_id from udpipe? If that is the case, the sentence_id is unique within the document. You need to make sure the sentence_id gets a unique identifier across all the documents instead of within each document. Solution to this is given at https://github.com/bnosac/textrank/issues/1

vmasias commented 6 years ago

Thanks!!!!!!!!!!