cis-ds / Discussion

Public discussion
10 stars 15 forks source link

What is row_number() - 1 doing in class exercise 2? #119

Closed rkcatipon closed 4 years ago

rkcatipon commented 4 years ago

I had a follow-up question from today's class exercise regarding these lines of code:

  filter(type == "Private, nonprofit") %>%
  arrange(cost) %>%
  # use row_number() but subtract 1 since UChicago is not cheaper than itself
  mutate(school_cheaper = row_number() - 1) %>%
  filter(name == "University of Chicago") %>%
  glimpse()

Specifically, I do not understand how row_number() ranks vectors. The example from the book has:

y <- c(1, 2, 2, NA, 3, 4)
row_number(y)
#> [1]  1  2  3 NA  4  5

But it's not clear to me what is happening here.

rkcatipon commented 4 years ago

Response from instructors:

Deblina

Hey Regina! This is a great question, and yes, issues on Github are the best place for similar questions. As a first pass, though, I'd say that row_number() gives something akin to an index of a vector. In the example code, then, we've arranged the dataframe according to cost, and then used the row_number() function to assign an index to each row (subtracting one because UChicago cannot be cheaper than itself). We then have this column, school_cheaper, that gives a kind of global sense of how expensive a school is. This is just a first glance explanation, so any mistakes are totally my fault. I'd be happy to get more into this on Github, where the rest of the class can also weigh in/benefit. Best, Deb

Dr. Soltoff

Correct. Row_number() is function to rank order rows based on their values for specified variables. It is not the only method. Check the documentation for examples of other functions with similar goals but different approaches for a comparison.

Thanks, Benjamin