Watts-College / paf-513-template

https://watts-college.github.io/paf-513-template/
MIT License
0 stars 0 forks source link

Lab 5 - Q5 - Why am I getting different answers? #16

Open swest235 opened 9 months ago

swest235 commented 9 months ago

@JasonSills I'm answering Q5 and I used code in 2 different ways. A short and long way. But they don't add up and I am not sure why. Can I please get some help pinpointing my mistake?

Here is my code:

5. Differences in Accidents

Question: Are there differences in the proportion of accidents that result in harm each day of the week?


#Attempt at Short Way: 
dat %>%
  group_by(day) %>%
  mutate(harm_acc = Totalinjuries > 0 | Totalfatalities > 0) %>%
  summarize(n = n(),
            harm.rate = mean(harm_acc))

##Long Way:
#Monday
print("Monday")
xMon <- dat %>% 
  select(day, Totalinjuries) %>% 
  filter(day == "Mon")

mean(xMon$Totalinjuries>=1)*100

#Tuesday
print("Tuesday")
xTue <- dat %>% 
  select(day, Totalinjuries) %>% 
  filter(day == "Tue")

mean(xTue$Totalinjuries>=1)*100

#Wednesday
print("Wednesday")
xWed <- dat %>% 
  select(day, Totalinjuries) %>% 
  filter(day == "Wed")

mean(xWed$Totalinjuries>=1)*100

#Thursday
print("Thursday")
xThu <- dat %>% 
  select(day, Totalinjuries) %>% 
  filter(day == "Thu")

mean(xThu$Totalinjuries>=1)*100

#Friday
print("Friday")
xFri <- dat %>% 
  select(day, Totalinjuries) %>% 
  filter(day == "Fri")

mean(xFri$Totalinjuries>=1)*100

#Saturday
print("Saturday")
xSat <- dat %>% 
  select(day, Totalinjuries) %>% 
  filter(day == "Sat")

mean(xSat$Totalinjuries>=1)*100

#Sunday
print("Sunday")
xSun <- dat %>% 
  select(day, Totalinjuries) %>% 
  filter(day == "Sun")

mean(xSun$Totalinjuries>=1)*100


Answer: Significantly more accidents occur Thursdays.

RESULTS: image

image

JasonSills commented 9 months ago

@swest235

In the "short version" you are analyzing both injuries and fatalities, but in the "long version" your analyzing only injuries.

In the short version: mutate(harm_acc = Totalinjuries > 0 | Totalfatalities > 0) %>%

In the long version: select(day, Totalinjuries) %>%

If we remove Totalfatalities from the short version you will have the same numbers.

swest235 commented 9 months ago

@JasonSills ok, that makes more sense. Does counting total injuries and total fatalities double count fatalities? I was under the impression that total injuries included fatalities, is that not the case?

JasonSills commented 9 months ago

@swest235

The data dictionary states that total injuries are non-fatal injuries.