cssearcy / AYS-R-Coding-SPR-2020

Coding in R for Policy Analytics
https://cssearcy.github.io/AYS-R-Coding-SPR-2020/
3 stars 3 forks source link

Lab 5 - Problem with 1.2 & 3.3 #20

Open SalvadorW2 opened 3 years ago

SalvadorW2 commented 3 years ago

Hello, I'm having difficulty with two of the problems: one easy one that I really shouldn't be struggling with and one difficult one:

First, the easy one - 1.2:

Here is my code:

dat %>% group_by(day) %>% summarize(day == "Mon") %>% count(TRUE)

I know I'm supposed to calculate the mean of a logical statement, but aside from manually calculating it using the different day counts, I'm not sure how to get the answer - at least in the manner requested.

This is what I have for 3.3:

casualties <- (dat$Totalfatalities > 0 | dat$Totalinjuries > 0) all.accidents <- (dat$Totalfatalities >= 0 | dat$Totalinjuries >= 0)

dat %>% group_by(hour) %>% filter(casualties / all.accidents) %>% summarize(n = n()) %>% plot(type = "b", bty = "n", pch = 19, cex = 2, xlab = "Hour of Day", ylab = "Proportion of Accidents Resulting in Harm", main = "Proportion of Crashes that Result in Injuries or Fatalities")

I think I'm on the right track, but I'm not sure if the "filter" command is right. It keeps throwing an error.

Any help would be much appreciated,

Sean

jamisoncrawford commented 3 years ago

Hi @SalvadorW2, maybe this will help!

First, the easy one - 1.2

You're mostly there - since each observation is an accident, you only need to calculate the proportion of them that are Mondays (== "Mon"). You can achieve this without dplyr and using core R's mean()!

This is what I have for 3.3

So your filter() expression, filter(casualties / all.accidents) won't work since casualties / all.accidents is not a conditional expression. filter() accepts logical values TRUE or FALSE, but you are dividing an array of TRUE or FALSE values by another array of TRUE or FALSE values. A TRUE divided by TRUE (1/1) would equal "1", a FALSE divided by TRUE (0/1) would equal "0", and anything divided by FALSE is essentially dividing by zero, which is NaN (not a number or undefined).

Try removing the filter() expression entirely. It's correct to group_by() on hour, but after that, you'll need to use summarize(). The summary value can just be the total amount of injuries combined with the total amount of fatalities per each hour - this will recreate the plot.

Let me know if this helps!

SalvadorW2 commented 3 years ago

I'm sorry. I think I've gotten the right answer for 1.2, but I'm still not getting the correct plot for 3.3.

Here is what I have now:

dat %>% group_by(hour) %>% summarize(casualties.per.hour = (sum(Totalinjuries > 0) + sum(Totalfatalities > 0) / hour12)) %>% arrange(casualties.per.hour) %>% summarize(n = n()) %>% plot(type = "b", bty = "n", pch = 19, cex = 2, xlab = "Hour of Day", ylab = "Proportion of Accidents Resulting in Harm", main = "Proportion of Crashes that Result in Injuries or Fatalities")

When I enter this, it pulls up the same plot as 3.1 and throws an error. I think the problem is that Totalinjuries and Totalfatalities are integers, while hour12 is a factor.

I think I know how to get the right answer, but due to technical complications that I don't think I fully understand, I'm unable to get it.

jamisoncrawford commented 3 years ago

No worries! Before group_by(hour), try a mutate function:

mutate(casualties = Totalinjuries + Totalfatalities,
       harmful = ifelse(casualties > 0, TRUE, FALSE))

Then use group_by() and finish up with summarize():

  summarize(proportion = percent(mean(harmful)))

Sorry if this is a bit too much to the point! Does that help?

BriannaSmithR commented 3 years ago

I'm also having trouble with this one (and to be honest most of them!!) Here's the code I'm trying to use based on what you said above.

dat %>% 
  group_by(hour12, age) %>% 
  summarize(n = n()) %>%      
  group_by(age) %>%           
mutate(casualties = dat$Totalinjuries + dat$Totalfatalities,
       harm = ifelse(casualties > 0, TRUE, FALSE)) 

When I try to run this I get "Error: Column casualties must be length 24 (the group size) or one, not 28470"

SalvadorW2 commented 3 years ago

Thank you.

I've gotten the data table to work, but now the plot isn't working.

SalvadorW2 commented 3 years ago

Brianna,

I apologize for writing this so late, but try moving the mutate function above group_by(hour).

Sean

jamisoncrawford commented 3 years ago

@BriannaSmithR your mutate() variables should be the sum of n as your first new variable, then n divided by that new variable you've created.