Data4Democracy / usa-dashboard

A dashboard of key metrics for the USA
69 stars 27 forks source link

Scrape/Munge UCR Data #34

Open seanjtaylor opened 7 years ago

seanjtaylor commented 7 years ago

The FBI's UCR data is updated yearly and contains aggregate crime stats at the MSA level. This will be our target variable.

Some URLs:

2015 MSA: https://ucr.fbi.gov/crime-in-the-u.s/2015/crime-in-the-u.s.-2015/tables/table-6 2014 MSA: https://ucr.fbi.gov/crime-in-the-u.s/2014/crime-in-the-u.s.-2014/tables/table-6 2013 MSA: https://ucr.fbi.gov/crime-in-the-u.s/2013/crime-in-the-u.s.-2013/tables/6tabledatadecpdf/table-6

We need at least 5-10 years of this. It shouldn't be too hard to scrape, but the data munging will be hard. Ideally we have a csv file that has column headers: year,msa,offense category,count

Once we get this we can model how incident-level reports aggregate up to these numbers.

bbrewington commented 7 years ago

Couple things to consider:

(1) How to handle MSA or MD with 2 "total" lines: [Total area actually reporting] & [Estimated Total]. It looks like the estimated total extrapolates the reported value to 100% of the population. In the 2015 data, the [Total area actually reporting] covers anywhere from 75% to 100% of a MSA (look at 2015 data --> Akron, OH for an example of < 100% actually reporting)

(2) Do we just want MSA (Metropolitan Statistical Area) data, or MD (Metropolitan Division) as well? MD's are subsets of MSA's as far as I can tell. Here's an example of an MSA w/ MD's: Chicago-Naperville-Elgin, IL-IN-WI M.S.A.

bbrewington commented 7 years ago

https://github.com/Data4Democracy/usa-dashboard/pull/35