comparison.cloud - Githubissues

The use of comparison.cloud at the end of chapter 3 is a bit of misleading. The size of a word is in proportion to the relative frequency in the word's corresponding group, positive or negative. The graph does show the most common positive and negative words in Jane Austen’s works, but it easily misleads the viewers to think that the size of a word is relative to the whole positive & negative word count so that the visualization can be used to infer the average sentiment of Austen's full works.

In my opinion, using different colors for positive and negative words on all Austen's work would be better. (In the Austen's case, the size change may not be obvious but in other case it could be significant.)

tidy_books %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  mutate(color = (sentiment == "positive") + 1) %>%
  with(wordcloud(word, n, max.words = 100, colors = color, ordered.colors=T))

rplot

dgrtwo / tidy-text-mining

comparison.cloud #18