Merge crime data across cities

seanjtaylor commented 7 years ago

Update from @mkedataguy:

I've been spending some time looking at normalizing offenses, and it's not pretty. I looked at the FBI's UCR and also the UN's ICCS. In a perfect work, I'd vote for the ICCS. It's a very detailed classification scheme with four different levels of classification. However, the data that we've got is nowhere near detailed enough. So, I'm leaning toward using the FBI's UCR classification scheme. Although, even that's not great. Chicago actually has the UCR code in their dataset, but the other cities specifically say their data isn't comparable to UCR. So, because there's bound to be some subject decisions, there will be issues in our classification. For example, NYC lists homicides as "Murder or Non-negligent Manslaughter". But that's really only half of the homicides. I couldn't find and sort of "Negligent Manslaughter" classification. Other cities lump both classifications into a single "Homicide" category. So, is there another classification that should be included with homicide for NYC? Are they not reported in the data set? Or are they included with murders? Don't know. So, that places suspicion on the homicide numbers. Other classifications have similar problems. I'm going to continue looking at this, but I don't have a good solution, yet.

seanjtaylor commented 7 years ago

Another update from @scottcame:

Well the FBI actually has several different classification schemes. There is UCR (been in use with some mods since the 1930s), NIBRS (extension of UCR), NCIC... And then some agencies define their own. Nearly all agencies report UCR (summary) data to the FBI via their state UCR program. Only about 35% of agencies submit NIBRS (incident level data). Some of this reporting involves manual categorization of offenses and arrests Most state UCR programs send data monthly or so, but it varies and some are very infrequent. For larger jurisdictions it should be possible to map to the nibrs codes, which is probably the right level for what you're doing Much of it is CAD data that does not contain incident or offense/arrest info

seanjtaylor commented 7 years ago

People being referenced as experts on this:

Nick Selby: https://twitter.com/nselby
@scottcame
Ian Mance at Southern Coalition for Social Justice
Michael F Schnuerle https://twitter.com/LouDataOfficer
Jeff Asher: https://twitter.com/Crimealytics
Jeff Benzing: https://twitter.com/jabenzing
Sarah Brayne: https://twitter.com/Sarah_Brayne

bbrewington commented 7 years ago

Some helpful info from https://ucr.fbi.gov/crime-in-the-u.s/2016/preliminary-semiannual-uniform-crime-report-januaryjune-2016 (regarding the rape category and a caution against ranking)

PLEASE NOTE In 2013, the FBI’s UCR Program initiated the collection of rape data under a revised definition within the Summary Based Reporting System. The term “forcible” was removed from the offense name, and the definition was changed to “penetration, no matter how slight, of the vagina or anus with any body part or object, or oral penetration by a sex organ of another person, without the consent of the victim.”

The number of rape incidents reported using the revised definition, as well as the number of rapes submitted using the legacy definition, are included in this report in separate columns in each table. The rape figures for those agencies that changed from reporting rape under the legacy definition in 2015 to the revised definition in 2016 are not included in trend calculations in Tables 1-3, but they are reported in Table 4 for agencies 100,000 and more in population. Please note: Rape data reported for 2015 and 2016 cannot be aggregated by all agencies. Instead, two distinct groups of agencies (those reporting using the legacy definition and those reporting using the revised definition) are used for calculating trends. Therefore, the percent changes from one year to the next within each group are calculated with fewer agencies than in recent years. Offenses with fewer counts are often sensitive to minor differences when calculating trends. More information about this subject is presented in footnotes and data declarations for each table.

Caution against ranking Figures used in this Report were submitted voluntarily by law enforcement agencies throughout the country. Individuals using these tabulations are cautioned against drawing conclusions by making direct comparisons between cities. Comparisons lead to simplistic and/or incomplete analyses that often create misleading perceptions adversely affecting communities and their residents. Valid assessments are possible only with careful study and analysis of the range of unique conditions affecting each local law enforcement jurisdiction. It is important to remember that crime is a social problem and, therefore, a concern of the entire community. In addition, the efforts of law enforcement are limited to factors within its control. The data user is, therefore, cautioned against comparing statistical data of individual agencies. Further information on this topic can be obtained in Uniform Crime Reporting Statistics: Their Proper Use.

srcole commented 7 years ago

Haven't been keeping up with the Slack, but is there a plan for this issue? This is definitely a difficult problem to solve. I am wondering if a reasonable approach would be:

Define keywords that relate to each crime category in UCR
Go through each crime category in cities that are not in UCR and convert each of their categories to crime categories in UCR by looking for the presence of keywords defined in (1).

Would this be a reasonable game plan, or do you guys think something else would be needed? If this is reasonable, I'm happy to work on it.

ghost commented 7 years ago

Hey guys, just wanted to stop by and see what work is already being done on this task and see how I could help. I'm adantonison on slack.

Data4Democracy / usa-dashboard

Merge crime data across cities #32