Final Project, Part 2 - Githubissues

DS4PS / course_website

0 stars 0 forks source link

I am working on the Final Project, Part 2 section:

The main window in this tab should include four tables or graphs:

A heat map of traffic accident locations.
Table/graph of accidents by day of week, one column for counts, one column for count, two for injuries and fatalities.
Table/graph of accidents summarized by collision type, one column for count, two for injuries and fatalities.
Table/graph of accidents summarized by driver age (use small number of meaningful categories), one column for count, two for injuries and fatalities.

So my question is how the table code is supposed to be to count injuries and fatalities. This is the code I have for Chart 3 for COLLISION TYPE-- is this right? Am I counting the right number of injuries and fatalities?

Chart 3

dat$harm <- dat$Totalinjuries > 0 | dat$Totalfatalities > 0
dat %>%
  filter( harm ) %>%
  count( Collisionmanner, Totalinjuries, Totalfatalities ) %>%
  arrange( -n ) %>%
  head( 10 ) %>%
  pander()

It shows up like this...is this correct? screen shot 2018-12-04 at 2 13 31 pm

When creating summary statistics you typically differentiate summaries of categorical variables (count()) versus numeric variables (summarize()). You can usually count things with the summarize function using sum() as well.

To create a set of statistics for all of the categories of a variable you would need to group_by() the variable.

dat %>%
  group_by( Collisionmanner ) %>%
  summarize( injuries=sum(Totalinjuries) ) %>%
  pander()

To calculate the harm rate, the formula would be number of accidents that result in injuries / number of accidents. You might calculate this like:

dat %>%
summarize( harm_rate = sum( dat$Totalinjuries > 0 | dat$Totalfatalities > 0 ) / n() )

But you can also just leverage the power of logical statements, and use the mean, which will be equivalent to the rate.

dat %>%
summarize( harm_rate = mean( dat$Totalinjuries > 0 | dat$Totalfatalities > 0 )  )

Just integrate into the code above with group_by() to get statistics for all levels of the groups. You might arrange by things like total accidents, harm rate, etc.

DS4PS / course_website

Final Project, Part 2 #34

Chart 3