DS4PS / cpp-526-fall-2020

Course shell for CPP 526 Foundations of Data Science I for Fall 2020
http://ds4ps.org/cpp-526-fall-2020/
MIT License
3 stars 1 forks source link

dplyr error? #21

Closed jacobtnyoung closed 2 years ago

jacobtnyoung commented 4 years ago

Hi all,

I was working through dplyr on my office Mac and executed the following code (for example) without a problem:


dat %>% 
  select(var) %>% 
  filter(var == "value", na.rm = TRUE) %>% 
  summarize(number_of_var= sum(var == "value"))

But, when I went to my laptop Mac I get the following error when I run the syntax:

Error: `na.rm` (`na.rm = TRUE`) must not be named, do you need `==`?

Any thoughts?

jacobtnyoung commented 4 years ago

Actually. I think the third line isn't needed anyway.

lecy commented 4 years ago

There are some package conflicts with dplyr functions sometimes. I have to change filter to dplyr::filter in some cases for the code to work correctly (isn’t always filter - occurs with several functions because they have common names).

Could be another package on your laptop?

Otherwise that’s strange. Computer should not matter if the code is working.

lecy commented 4 years ago

Agreed - third line is redundant if you are just counting TRUEs

jamisoncrawford commented 4 years ago

Reopening this for visibility - absolutely, what @lecy said. There are occasionally shared function names so prequalifying the function name with the package name (dplyr::filter()) disambiguates this!

jacobtnyoung commented 4 years ago

Go it!

AprilPeck commented 4 years ago

I am using group_by and it was working fine, but then I did ungroup to do some manipulation. I've deleted the Ungroup but now it won't group. ??

jamisoncrawford commented 4 years ago

@AprilPeck can you be more specific? Perhaps share some of your code and what question your on (assuming this is for Lab 05)?

Also, make sure to run library(dplyr) every time you close and reopen RStudio or start a new session to use dplyr functions.

AprilPeck commented 4 years ago

This is on section 3 question 4.

I didn't close it...it was working fine and then after playing around with it, it just stopped grouping. I did try closing and reopening it, and it's still not working. Here is my current code

dat %>% group_by(hour)

And this is the result... image

jamisoncrawford commented 4 years ago

@AprilPeck thanks for providing a bit more context!

Based on your code, we can't really tell if it's working or not and, in fact, it appears to be working just fine. Why?

Well, you've grouped your data by variable hour successfully, most likely - and it ends there. Nothing else happens. You've simply told R that you want each unique value in variable hour to be treated as its own group!

In order to see the effects of group_by(), we need to pipe it, once grouped, into a transformation function (like mutate()) or aggregation function (like summarize()). Only then will these "groups" have any real influence on the manipulation of your data.

To demonstrate:

library(dplyr)

mtcars %>%
  group_by(cyl)

# A tibble: 32 x 11
# Groups:   cyl [3]
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
# ... with 22 more rows

...compare this to:

library(dplyr)

mtcars %>%
  group_by(cyl) %>%
  summarize(avg = mean(mpg))

# A tibble: 3 x 2
    cyl   avg
  <dbl> <dbl>
1     4  26.7
2     6  19.7
3     8  15.1

Does this make sense?

AprilPeck commented 4 years ago

Oy, yes it does. Thank you.

jamisoncrawford commented 4 years ago

@AprilPeck you bet!