Closed deyanyosifov closed 8 months ago
Wow @deyanyosifov this is a typo that has made it through 5 years of corrections from users, multiple rounds of copyediting, etc. Congratulations! 😆
I'll get this corrected and submitted to the errata.
Ah, this was already reported to the errata and I approved it! 🙈
https://www.oreilly.com/catalog/errata.csp?isbn=9781491981658
I fixed this in #112 and the new version is now deployed at https://www.tidytextmining.com/tfidf
Thanks again @deyanyosifov!
"There are very long tails to the right for these novels (those extremely rare words!) that we have not shown in these plots." In fact, the extremely rare words have low n/total and they are at the leftmost side of the histogram. There are a lot of unique rare words that were used once or twice in a book, that's why the first column of the histogram is so high. The common words are not so many, they have high n/total and are to the right. The most common words ("a", "the", prepositions) are not even on the histograms because the x-axis has been limited to the right. For "the" in Mansfield Park n/total = 0.0386751 which is larger that 0.0009 that is the threshold of the x-axis.