Closed chriskgernon closed 4 years ago
The confusion is probably the special R symbol for piping: %>%
Essentially if you pipe %>%
one function into another, it means the first parameter of the next function is the result of the previous.
In this specific example, unnest_tokens()
link
takes this form:
unnest_tokens(tbl, output, input, token = "words", format = c("text",
"man", "latex", "html", "xml"), to_lower = TRUE, drop = TRUE,
collapse = NULL, ...)
because it's been used in the context of a pipe %>% the first parameter of unnest_tokens (a tbl table), is actually output of the previous function, which in this case is the mutate() function to remove stop words.
You could also write that block of functions somewhat like this:
unnest_tokens( select( winterWordPairs , text), mutate( text = removeWords(text, stop_words$word)) , paired_words, text, token = "ngrams", n = 2)
I am going over the R code and I cannot figure out where "paired_words" comes from in the unnest_tokens() function or what it does.
` winterWordPairs <- winterTweetsGeo %>% select(text) %>% mutate(text = removeWords(text, stop_words$word)) %>% unnest_tokens(paired_words, text, token = "ngrams", n = 2)
winterWordPairs <- separate(winterWordPairs, paired_words, c("word1", "word2"),sep=" ") winterWordPairs <- winterWordPairs %>% count(word1, word2, sort=TRUE)
graph a word cloud with space indicating association. you may change the filter to filter more or less than pairs with 10 instances
winterWordPairs %>% filter(n >= 5) %>% # we changed this to 2, rather than 15 graph_from_data_frame() %>% ggraph(layout = "fr") +
geom_edge_link(aes(edge_alpha = n, edge_width = n)) +
geom_node_point(color = "darkslategray4", size = 3) + geom_node_text(aes(label = name), vjust = 1.8, size = 3) + labs(title = "Word Network: Tweets during the 2013 Colorado Flood Event", subtitle = "September 2013 - Text mining twitter data ", x = "", y = "") + theme_void()`