RMHogervorst / NLP_SN

Code for Natural language processing of Security Now! podcast
MIT License
0 stars 0 forks source link

try these methods #1

Open RMHogervorst opened 6 years ago

RMHogervorst commented 6 years ago

https://www.r-bloggers.com/an-overview-of-keyword-extraction-techniques/amp/

RMHogervorst commented 6 years ago

using udpipe:

## Collocation (words following one another)
stats <- keywords_collocation(x = x, 
                             term = "token", group = c("doc_id", "paragraph_id", "sentence_id"),
                             ngram_max = 4)
## Co-occurrences: How frequent do words occur in the same sentence, in this case only nouns or adjectives
stats <- cooccurrence(x = subset(x, upos %in% c("NOUN", "ADJ")), 
                     term = "lemma", group = c("doc_id", "paragraph_id", "sentence_id"))
## Co-occurrences: How frequent do words follow one another
stats <- cooccurrence(x = x$lemma, 
                     relevant = x$upos %in% c("NOUN", "ADJ"))
## Co-occurrences: How frequent do words follow one another even if we would skip 2 words in between
stats <- cooccurrence(x = x$lemma, 
                     relevant = x$upos %in% c("NOUN", "ADJ"), skipgram = 2)

with textrank : -pagerank on words (only noun and adj for example)