dgrtwo / tidy-text-mining

Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson
http://tidytextmining.com
Other
1.32k stars 805 forks source link

line / linenumber column name consistency in examples #43

Closed charliejhadley closed 6 years ago

charliejhadley commented 6 years ago

In the README for tidytext the column name for line numbers is called linenumbers:

https://github.com/juliasilge/tidytext

original_books <- austen_books() %>%
  group_by(book) %>%
  mutate(linenumber = row_number()) %>%
  ungroup()

In a number of examples in the book the column is instead called line:

https://www.tidytextmining.com/tidytext.html

library(dplyr)
text_df <- data_frame(line = 1:4, text = text)

text_df

This adds a tiny bit of friction to copying and pasting examples between the two resources. If there's interest I'll happily do the grunt work to make a pull request against the two to make them all consistent.

Hopefully you don't think this is a nitpick! tidytext is awesome and I really enjoy the book! I've observed some new R users get stuck when working with the text because of this issue, is all 🙂

juliasilge commented 6 years ago

Thanks for comment! If you have observed people having trouble, I bet others have as well. I decided to change the variable name in the README and vignette over on the tidytext repo in commit 85da3e67e834d1652a31ff5e4e27184d27e6ab4b.

juliasilge commented 6 years ago

I don't understand why linking is not working. 😩 It didn't successfully link either my commit message over from the tidytext repo or the commit SHA from this repo.

charliejhadley commented 6 years ago

@juliasilge how odd, first time I've seen GitHub fail on that! Ohh, I see you've parcelled in a bunch of additional changes like get_stopwords() noice 😄