Closed LarryBafundo closed 6 years ago
Started to pull all West Virginia NIBRS downloads together from 2008 through 2016 here: https://github.com/18F/crime-data-prototypes/tree/master/demos/multi-year-nibrs
Yeah, it gets large. Since 2006, there have 743,637 incidents in NIBRS, with 823,195 offenders for instance. I feel like for a state like California or Texas it would 2-3x as large
We could also continue segmented NIBRS by offense family if individual files/datasets get too large. Here are the NIBRs offense families
Offense Family |
---|
Arson |
Assault Offenses |
Burglary/Breaking & Entering |
Counterfeiting/Forgery |
Destruction/Damage/Vandalism of Property |
Drug/Narcotic Offenses |
Embezzlement |
Extortion/Blackmail |
Fraud Offenses |
Gambling Offenses |
Homicide Offenses |
Kidnapping/Abduction |
Larceny/Theft Offenses |
Motor Vehicle Theft |
Pornography/Obscene Material |
Prostitution Offenses |
Robbery |
Sex Offenses |
Stolen Property Offenses |
Weapon Law Violations |
But that would mean having many files of smaller size rather than a few of bigger ones
thanks; let's see what we can learn from the new format and testing this week and then we can explore other ways of making it available in the future. i think you're right that trying to do everything in one file without some kind of partitioning isn't going to be sustainable, so maybe we do what you're suggesting instead.
some additional questions to explore if we still want to move in this direction.
--how big is too big when it comes to file size? are there clear limits to what our users can download and work with? how might we test this? --if we want to reduce file size by breaking one large file into smaller, more manageable pieces, what partitioning strategy makes the most sense (e.g. crimes against persons vs. offense type)? --how might our partitioning strategy affect the generation and maintainability of these files? --would partitioned files be harder to work with or increase the likelihood of miscounts and user error? how might we test this?
Will move this issue to the backlog for now, as we first need to get the content right before we consider how to package it.
cc: @harrisj, @jeremiak
this is an ongoing question that is somewhat dependent on the following:
https://waffle.io/18F/crime-data-explorer/cards/5a1ca85f7fc9aa0121da7a6b
https://waffle.io/18F/crime-data-explorer/cards/5a1ca85f7fc9aa0121da7a6b
Before we decide on how we want to package this information (single year or time series) we need to figure out how we should be working with/counting this data in the first place. Then we need to weigh the costs/benefits of providing the data in a way that promotes flexibility and the value of NIBRS with potentially passing on complexity to our consumers.
we still need to explore this, both in terms of the temporary and longterm solutions
in the interest of getting a short-term fix ready ASAP, we're going to with a fully normalized, single year approach for now. we should consider the feasibility of a time series & denormalized approach in the future. closing for now.
We heard that users find a single year's view of the data to be limiting and value a historical view to facilitate analysis. Having a file that aggregates all of the available incident data for a given state would also make this data easier to work with.