DS4PS / course_website

https://ds4ps.github.io/course_website/
0 stars 0 forks source link

Final Project, Part 2 #34

Open PaisleyMarie opened 5 years ago

PaisleyMarie commented 5 years ago

I am working on the Final Project, Part 2 section:

The main window in this tab should include four tables or graphs:

So my question is how the table code is supposed to be to count injuries and fatalities. This is the code I have for Chart 3 for COLLISION TYPE-- is this right? Am I counting the right number of injuries and fatalities?

Chart 3

dat$harm <- dat$Totalinjuries > 0 | dat$Totalfatalities > 0
dat %>%
  filter( harm ) %>%
  count( Collisionmanner, Totalinjuries, Totalfatalities ) %>%
  arrange( -n ) %>%
  head( 10 ) %>%
  pander()

It shows up like this...is this correct? screen shot 2018-12-04 at 2 13 31 pm

lecy commented 5 years ago

When creating summary statistics you typically differentiate summaries of categorical variables (count()) versus numeric variables (summarize()). You can usually count things with the summarize function using sum() as well.

To create a set of statistics for all of the categories of a variable you would need to group_by() the variable.

dat %>%
  group_by( Collisionmanner ) %>%
  summarize( injuries=sum(Totalinjuries) ) %>%
  pander()

To calculate the harm rate, the formula would be number of accidents that result in injuries / number of accidents. You might calculate this like:

dat %>%
summarize( harm_rate = sum( dat$Totalinjuries > 0 | dat$Totalfatalities > 0 ) / n() )

But you can also just leverage the power of logical statements, and use the mean, which will be equivalent to the rate.

dat %>%
summarize( harm_rate = mean( dat$Totalinjuries > 0 | dat$Totalfatalities > 0 )  )

Just integrate into the code above with group_by() to get statistics for all levels of the groups. You might arrange by things like total accidents, harm rate, etc.