elipousson / crashapi

💥🚙💥 R package to get Fatality Analysis Reporting System (FARS) data with the FARS API
https://elipousson.github.io/crashapi/
Other
15 stars 0 forks source link

Detailed fars crash data for various states #2

Closed eugenividal closed 1 year ago

eugenividal commented 1 year ago

Hi @elipousson. Thanks for the package. It looks very useful!

I was wondering whether it is possible to get FARS crash data in detail for multiple states at once.

With the general function get_fars(), using the argument 'details = TRUE' I get an object with 104 variables, but I need to specify the state and county. With the get_fars_crash_list() function, I can get multiple states at once, but the output object I get has only 9 variables. Is it possible to get an object for multiple states with the 104 variables?

elipousson commented 1 year ago

When you use details = TRUE get_fars is actually calling get_fars_crashes() which uses the Get Crashes By Location API endpoint. Based on the official documentation, I think the API requires both the state and county to work. You can always use purrr::map2_dfr() to pass vectors of states and counties to get_fars() although the downloads are so slow with this approach, I'd be reluctant to add it to the package:

states <- c("CT", "RI")

counties <-
  purrr::map_dfr(
    states,
    ~ tigris::counties(state = .x, cb = TRUE)
  )

fatal_crashes <-
  purrr::map2_dfr(
    counties$STATEFP,
    counties$COUNTYFP,
    ~ crashapi::get_fars(
      year = 2018,
      state = .x,
      county = .y,
      details = TRUE
    )
  )

The other option is to use get_fars_zip() to download data for a whole year. It is harder to work with because the data is just the codes – so you'd need to refer back to the FARS documentation to make sense of it. There are also multiple tables for each year and the number and type of tables vary so that is also a little complicated. But it pulls down all 50 states in a single quick download so it is fast at least. The function used to be limited to just downloading the zip file but I just finished a draft function that unzips the download and reads it into a list of data frames. If you try it, let me know if you run into any bugs!

eugenividal commented 1 year ago

Hi @elipousson. Thanks for your response. I think it makes more sense to use get_far_zip() in this case, as my idea was to look at all the US and several years.

I tried using get_far_zip() but I got the following error:

> get_fars_zip(year = 2020, format = "csv", path = NULL, pr = FALSE, read = TRUE, geometry = FALSE)
trying URL 'https://static.nhtsa.gov/nhtsa/downloads/FARS/2020/National/FARS2020NationalCSV.zip'
Content type 'application/x-zip-compressed' length 31674692 bytes (30.2 MB)
==================================================
downloaded 30.2 MB

Error in get_fars_zip(year = 2020, format = "csv", path = NULL, pr = FALSE,  : 
  object 'crash_tables' not found
In addition: Warning messages:
1: One or more parsing issues, see `problems()` for details 
2: One or more parsing issues, see `problems()` for details 
3: One or more parsing issues, see `problems()` for details 
elipousson commented 1 year ago

Whoops. Just a mistake in renaming the output object on my part. If you reinstall it should work now. You could probably write a wrapper function for get_fars_zip that combines multiple years or you could set read = FALSE and then read tables for multiple years yourself (especially if you only need the "accident" table).

Looking at the data again (I actually never dug too deeply into the older data available in these zipped files) it looks like the auxiliary datasets may actually be more useful for year-to-year comparisons. It may be a quick change to support those as well so I'll take a look right now.

elipousson commented 1 year ago

That was pleasantly easy. Reinstall and set aux = TRUE if you want to access the auxiliary datasets. Please share what you learn! My original goal with this project was to enable more reproducible road safety research in the U.S. but I haven't had the time (or any funding) to use the package for anything more than a basic descriptive analysis.

eugenividal commented 1 year ago

It is working now. Thanks @elipousson. Great goal! Happy to share! I am just trying to compare US and UK road traffic fatalities by light condition for a potential paper. You can see some initial results here https://github.com/saturnrg/data-overview/blob/main/comparison_US_UK.md. The proportion of pedestrian killed in the dark in the US seems really high!

elipousson commented 1 year ago

I knew darkness was a major factor in pedestrian fatalities but I didn't realize the difference was so stark. Thanks for sharing and for using the package!

eugenividal commented 1 year ago

Thank you for developing the package!