dgrtwo / tidy-text-mining

Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson
http://tidytextmining.com
Other
1.31k stars 806 forks source link

comparison.cloud #18

Closed eijoac closed 7 years ago

eijoac commented 7 years ago

The use of comparison.cloud at the end of chapter 3 is a bit of misleading. The size of a word is in proportion to the relative frequency in the word's corresponding group, positive or negative. The graph does show the most common positive and negative words in Jane Austen’s works, but it easily misleads the viewers to think that the size of a word is relative to the whole positive & negative word count so that the visualization can be used to infer the average sentiment of Austen's full works.

In my opinion, using different colors for positive and negative words on all Austen's work would be better. (In the Austen's case, the size change may not be obvious but in other case it could be significant.)

tidy_books %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  mutate(color = (sentiment == "positive") + 1) %>%
  with(wordcloud(word, n, max.words = 100, colors = color, ordered.colors=T))

rplot

juliasilge commented 7 years ago

I definitely see your point but we want here to show code for how to use the comparison cloud function, since we already showed how to do the word cloud. I'll add some clarifying text to emphasize that the size is relative to the count within that sentiment only.