DS4PS / cpp-526-spr-2021

Course shell for Foundations of Data Science I
https://ds4ps.org/cpp-526-spr-2021/
MIT License
1 stars 2 forks source link

Lab 5 Part 3 Question 4 #14

Open ellihammons21 opened 3 years ago

ellihammons21 commented 3 years ago

Hello,

I am on the last question of the lab and am running into some trouble.

Here is my code:



dat %>%
  filter( Totalfatalities > 0 | Totalinjuries > 0) %>%
  group_by( hour ) %>%
  mutate (total.casualties = Totalinjuries + Totalfatalities) %>%
  summarize ( total.casualties / n)

  plot(type = "b",
       bty = "n",
       pch = 19,
       cex = 2,
       xlab = "Hour of the Day",
       ylab = "Ave. Number Passengers Hurt",
       main = "Average Injuries or Fatalities Per Harmful Crash")
'''

And here is the error code that r is throwing. 

Error: Problem with `summarise()` input `..1`. x non-numeric argument to binary operator i Input `..1` is `total.casualties/n`. i The error occurred in group 1: hour = "00". Run `rlang::last_error()` to see where the error occurred.

It seems like it is rejecting the summarize function because the variable total.casualties is a binary operator, but both totalinjuries and totalfatalities are numeric so I'm not quite sure what the issue is here.

Thanks!
lecy commented 3 years ago

Here's a good example of the importance of reproducible examples / code.

Adding a head() statement to the data frame would show what's happening with the variables and would make it easier to diagnose. What do you get from this?

dat %>%
  filter( Totalfatalities > 0 | Totalinjuries > 0) %>%
  group_by( hour ) %>%
  mutate (total.casualties = Totalinjuries + Totalfatalities) %>%
  # summarize ( total.casualties / n) %>% 
  select( hour, Totalinjuries, Totalfatalities, total.casualties ) %>%  
  head( 10 )

I suspect, though, that you have not yet defined n, or you might mean n() instead?

lecy commented 3 years ago

Every now and then I will get an error related to the summarize() function because several packages have functions of this name.

Using the explicit package::function( ) designation resolves that issue:

dplyr::summarize( ... )

But I doubt that's the issue here.

ellihammons21 commented 3 years ago

Oops, sorry about that. I tried to make it reproducible but apparently adding the ```{r} wasn't enough. Do I need to post the packages as well to make it reproducible?

Using the code that you recommended (adding the head(10)) it still throws the same error message.

Error: Problem with summarise() input ..1. x non-numeric argument to binary operator i Input ..1 is total.casualties/n. i The error occurred in group 1: hour = "00". Run rlang::last_error() to see where the error occurred.

jamisoncrawford commented 3 years ago

Hi @ellihammons21 - the steps and example @lecy laid out would show the first 10 rows of what your data look like (using head(10)) if you remove or "comment out" the suspected problematic operation. Because he suspects the use of n and not n()...

I suspect, though, that you have not yet defined n, or you might mean n() instead?

... he places a # before that line in the reproducible example. This would make the suspected problem in your piped chain to no longer evaluate - again, a technique often called "commenting out". Here's the chain of operations again:

dat %>%
  filter( Totalfatalities > 0 | Totalinjuries > 0) %>%
  group_by( hour ) %>%
  mutate (total.casualties = Totalinjuries + Totalfatalities) %>%
  # summarize ( total.casualties / n) %>%  
  select( hour, Totalinjuries, Totalfatalities, total.casualties ) %>%  
  head( 10 )

...and note that the problematic expression is "commented out":

# summarize ( total.casualties / n) %>%  

Now, the question is, what would be the result of that, or at least the first 10 rows using head()?


The dplyr::summarize() is more of an FYI and just good to know! I also use this on occasion for longer scripts with several packages just to remind myself or colleagues where a function comes from - which is especially important when you use, like, eight different packages just for Shiny.


All that said, as @lecy suggests, try replacing n with n() since R is going to think n is an object containing some datum or data, but due to dplyr knows n() is a function that counts your observations. See if you still get the error message and let us know!

P.S. non-numeric argument to binary operator typically is what you see when an argument is given the wrong format in a function, but doesn't necessarily mean it involves binaries!