Open SalvadorW2 opened 3 years ago
Hi @SalvadorW2, maybe this will help!
First, the easy one - 1.2
You're mostly there - since each observation is an accident, you only need to calculate the proportion of them that are Mondays (== "Mon"
). You can achieve this without dplyr
and using core R's mean()
!
This is what I have for 3.3
So your filter()
expression, filter(casualties / all.accidents)
won't work since casualties / all.accidents
is not a conditional expression. filter()
accepts logical values TRUE
or FALSE
, but you are dividing an array of TRUE
or FALSE
values by another array of TRUE
or FALSE
values. A TRUE
divided by TRUE
(1/1
) would equal "1", a FALSE
divided by TRUE
(0/1
) would equal "0", and anything divided by FALSE
is essentially dividing by zero, which is NaN
(not a number or undefined).
Try removing the filter()
expression entirely. It's correct to group_by()
on hour
, but after that, you'll need to use summarize()
. The summary value can just be the total amount of injuries combined with the total amount of fatalities per each hour - this will recreate the plot.
Let me know if this helps!
I'm sorry. I think I've gotten the right answer for 1.2, but I'm still not getting the correct plot for 3.3.
Here is what I have now:
dat %>% group_by(hour) %>% summarize(casualties.per.hour = (sum(Totalinjuries > 0) + sum(Totalfatalities > 0) / hour12)) %>% arrange(casualties.per.hour) %>% summarize(n = n()) %>% plot(type = "b", bty = "n", pch = 19, cex = 2, xlab = "Hour of Day", ylab = "Proportion of Accidents Resulting in Harm", main = "Proportion of Crashes that Result in Injuries or Fatalities")
When I enter this, it pulls up the same plot as 3.1 and throws an error. I think the problem is that Totalinjuries and Totalfatalities are integers, while hour12 is a factor.
I think I know how to get the right answer, but due to technical complications that I don't think I fully understand, I'm unable to get it.
No worries! Before group_by(hour)
, try a mutate function:
mutate(casualties = Totalinjuries + Totalfatalities,
harmful = ifelse(casualties > 0, TRUE, FALSE))
Then use group_by()
and finish up with summarize()
:
summarize(proportion = percent(mean(harmful)))
Sorry if this is a bit too much to the point! Does that help?
I'm also having trouble with this one (and to be honest most of them!!) Here's the code I'm trying to use based on what you said above.
dat %>%
group_by(hour12, age) %>%
summarize(n = n()) %>%
group_by(age) %>%
mutate(casualties = dat$Totalinjuries + dat$Totalfatalities,
harm = ifelse(casualties > 0, TRUE, FALSE))
When I try to run this I get "Error: Column casualties
must be length 24 (the group size) or one, not 28470"
Thank you.
I've gotten the data table to work, but now the plot isn't working.
Brianna,
I apologize for writing this so late, but try moving the mutate function above group_by(hour).
Sean
@BriannaSmithR your mutate()
variables should be the sum of n
as your first new variable, then n
divided by that new variable you've created.
Hello, I'm having difficulty with two of the problems: one easy one that I really shouldn't be struggling with and one difficult one:
First, the easy one - 1.2:
Here is my code:
dat %>% group_by(day) %>% summarize(day == "Mon") %>% count(TRUE)
I know I'm supposed to calculate the mean of a logical statement, but aside from manually calculating it using the different day counts, I'm not sure how to get the answer - at least in the manner requested.
This is what I have for 3.3:
casualties <- (dat$Totalfatalities > 0 | dat$Totalinjuries > 0) all.accidents <- (dat$Totalfatalities >= 0 | dat$Totalinjuries >= 0)
dat %>% group_by(hour) %>% filter(casualties / all.accidents) %>% summarize(n = n()) %>% plot(type = "b", bty = "n", pch = 19, cex = 2, xlab = "Hour of Day", ylab = "Proportion of Accidents Resulting in Harm", main = "Proportion of Crashes that Result in Injuries or Fatalities")
I think I'm on the right track, but I'm not sure if the "filter" command is right. It keeps throwing an error.
Any help would be much appreciated,
Sean