bnosac / textrank

Summarise text by finding relevant sentences and keywords using the Textrank algorithm
76 stars 9 forks source link

[Clarification] textrank_sentences() #5

Closed fahadshery closed 5 years ago

fahadshery commented 5 years ago

Hi,

I need some clarification on stopifnot(sum(duplicated(data[, 1])) == 0) when using textrank_sentences() I dedup my sentences and my terminology as well using sentences <- unique(verbatim_tokens[, c("sentence_id", "sentence")]) terminology <- subset(verbatim_tokens, upos %in% c("NOUN", "ADJ")) terminology <- terminology[, c("sentence_id", "lemma")] terminology <- unique(terminology[, c("sentence_id", "lemma")])

But I still get this error Error in textrank_sentences(data = sentences, terminology = terminology) : sum(duplicated(data[, 1])) == 0 is not TRUE I did debugonce(textrank_sentences) and data[,1] seems to be my sentence_id which is bound to be duplicated?

jwijffels commented 5 years ago

Same issue as https://github.com/bnosac/textrank/issues/1 Solution was geven there

fahadshery commented 5 years ago

perfect. thanks it worked out well.

Could we also bring in other Algorithms such as this python library?

Cheers,

jwijffels commented 5 years ago

Lexrank and latent semantic analysis are 2 other packages which are on cran since a long time already. Don't know about Luhn.

jwijffels commented 5 years ago

Any other algorithms you had in mind?