dgrtwo / tidy-text-mining

Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson
http://tidytextmining.com
Other
1.32k stars 806 forks source link

chapter 3 tf-idf #49

Closed Ralf4data closed 6 years ago

Ralf4data commented 6 years ago

Dear Julia,

Nice to meet you! I am a newer to R world. I am Wu Yusheng. Firstly thank you and David very much for the excellent book Tidy Text Mining with R. Very excellent! Amazing! I have one question. When I typing by myself for your code in chapter 3 tf-idf, I found the code error in RStudio: Error in log(x, base) : non-numeric argument to mathematical function as below showing: freq_by_rank %>% ggplot(aes(rank, 'term frequency', color = book)) + geom_line(size = 1.1, alpha = 0.8, show.legend = FALSE) + scale_x_log10() + scale_y_log10() Error in log(x, base) : non-numeric argument to mathematical function

I did not know why it was. When I search in github for your article, I make a try to copy your code from github and paste into my RStudio, then it works! But the codes from you looks exactly the same to mine. Finally I find the reason: your term frequency is different with my 'term frequency'.

image

Could you please find deeper reason for me? I think some R packages can't read ' ' but image . I mean the single quotation mark.

Again thank you very much for the excellent book!

Best Regards Wu Yusheng

Felipe1990 commented 6 years ago

Hi Wu, (I'm not Julia, sorry, but I think I can help here :) ) When you use '' R interprets the input as characters, as you were using ""; on the other hand `` allows to enter the name of a variable (or object) with spaces or that begins with a numeric digit, which is what ggplot2 expects (not the name as a character). You are right noting that it is what is generating your problem, the error message basically is saying "I can calculate the log of a word". Hopefully this helped Felipe

juliasilge commented 6 years ago

Thanks so much for the help here @Felipe1990! Check out this Stack Overflow answer that goes into more detail about how R handles backticks vs. single quotes. These are different characters.