Closed harrisj closed 6 years ago
So this is how agencies reported NIBRS months in 2015
nibrs_months_reported | count |
---|---|
1 | 1,017 |
2 | 537 |
3 | 378 |
4 | 253 |
5 | 218 |
6 | 206 |
7 | 149 |
8 | 125 |
9 | 107 |
10 | 112 |
11 | 109 |
12 | 5 |
And this is 2016
nibrs_months_reported | count |
---|---|
1 | 45 |
2 | 42 |
3 | 42 |
4 | 31 |
5 | 48 |
6 | 36 |
7 | 39 |
8 | 48 |
9 | 50 |
10 | 76 |
11 | 131 |
12 | 6,331 |
And zooming in further, here is the nibrs_month
table for Hoover, AL in 2015
agency_id | data_year | month_num | reported_status |
---|---|---|---|
61 | 2,015 | 1 | I |
61 | 2,015 | 2 | I |
61 | 2,015 | 3 | I |
61 | 2,015 | 4 | I |
61 | 2,015 | 5 | I |
61 | 2,015 | 6 | I |
61 | 2,015 | 7 | I |
61 | 2,015 | 8 | U |
61 | 2,015 | 9 | U |
61 | 2,015 | 10 | I |
61 | 2,015 | 11 | U |
61 | 2,015 | 12 | U |
I = incident report, U = unreported
Hoover seems to have a complete set of nibrs_month records, but the larger issue seems to be an overall drop in the number of records in the nibrs_month
table for that year
data_year | count |
---|---|
2013 | 267,984 |
2014 | 269,412 |
2015 | 112,860 |
2016 | 271,728 |
@harrisj this seems like a separate issue from what we were discussing earlier - that the business line officially determines which states reports NIBRS based on the state of their respective programs. this should be reflected on the "how states report" map on the about page: https://crime-data-explorer.fr.cloud.gov/about.
But do you have a sense what is behind these dips in participation? Is it something we need to explore further
cc: @jpwentz
PER UCR:
select count(distinct RA.agency_id) from NIBRS_MONTH NM JOIN REF_AGENCY RA ON RA.AGENCY_ID = NM.AGENCY_ID WHERE NM.DATA_YEAR = 2015 and RA.AGENCY_STATUS = 'A' AND NM.REPORTED_STATUS IN ( 'I', 'Z') 6673 U 16576 I 6444 Z 1098
@jpwentz can you rerun that so it just gives me a count of total nibrs_month records in 2015? The issue is not that agencies aren't in there at all, but many are missing months of representation in nibrs_month in 2015
I will request that from Justin -- I think CDE concept of participation is not accurate. It looks like that should be defined by the status in the REFA.csv.
@jpwentz can you elaborate on what you mean by that?
Currently, we define participation by an agency in a given year as "reported for 12 months or was covered by an agency that reported for 12 months". We have participation metrics for RETA (using reta_month) and NIBRS (using nibrs_month). Any feedback would be welcome if there is a different way we should be doing that.
5954 12 months for I or Z coming from nibrs_month table in UCR
select COUNT(*) FROM ( select distinct RA.ORI from NIBRS_MONTH NM JOIN REF_AGENCY RA ON RA.AGENCY_ID = NM.AGENCY_ID WHERE NM.DATA_YEAR = 2015 and RA.AGENCY_STATUS = 'A' AND NM.REPORTED_STATUS IN ( 'I', 'Z') HAVING SUM(NM.MONTH_NUM) = 78 GROUP BY RA.ORI )
select distinct ra.ori from NIBRS_MONTH NM JOIN REF_AGENCY RA ON RA.AGENCY_ID = NM.AGENCY_ID WHERE NM.DATA_YEAR = XXXX and RA.AGENCY_STATUS = 'A' AND NM.REPORTED_STATUS IN ( 'I', 'Z')
2013 - 6423 2014 - 6628 2015 - 3216 2016 - 6919
We need to get an answer on how to determine this information -- We can use the above script and have it update the agency_participation table.
@harrisj your thoughts?
Running the same script in CDE: select COUNT(*) FROM ( select distinct RA.ORI from NIBRS_MONTH NM JOIN REF_AGENCY RA ON RA.AGENCY_ID = NM.AGENCY_ID WHERE NM.DATA_YEAR = 2015 and RA.AGENCY_STATUS = 'A' AND NM.REPORTED_STATUS IN ( 'I', 'Z') GROUP BY RA.ORI HAVING SUM(NM.MONTH_NUM) = 78 ) as foo
I get 5 !
2015 only has report status of U and I --- 2016 has report status of U I and Z
Excellent. So we've confirmed the 2015 data is truncated in some fashion, although it is still to be determined if it's because the CSV was incomplete or there was an issue loading it
2015 NIBRS_M csv deliver to 18F contained NIBRS_M rows with report_status of Z -- but were not ingested.
GIT Repo history does not show this file changing but may not have existed in the repo until after the 2015 data was ingested. It will be difficult to determine the deltas from 2015 script to 2016 script.
I have a copy of the data and it can be re-ingested at any point.
Also note that 2015 Data has dates defined for rows of report_status Z and 2016 does not have those date columns populated -- I would assume the failure occurred here.
@jpwentz got it; if our immediate priorities are to ensure that the data displayed by the CDE is accurate should we talk to Cindy about prioritizing the reimport of the 2015 data relatively soon? do you need assistance from us to do that?
@LarryBafundo It will not be all of the 2015 data -- it will only be the tables that utilize NIBRS_MONTH. It would be helpful to have some insight on which tables those are -- and then assistance with creating the subset script for just ingesting that.
AFAIK, this would mean that we reload those fields for 2015 in nibrs_month
but that we then have to rebuild a few tables that depend on nibrs_participation:
agency_participation
participation_rates
cde_agencies
I think the other tables are probably fine since some of them use the "current year" and I think that is 2016 which would be unchanged.
So I kinda dropped the ball on following up on this one. Do we have a new file to try loading?
@jpwentz is this something you can take on? its a pretty visible issue on production.
@jpwentz want to catch-up on this?
@LarryBafundo As mentioned previously we need to re-ingested the 2015 NIBRSM data. It looks like it failed during ingestion based upon my above comments. I will work with @harrisj on creating the scripts to just handle importing agencies.
thanks @jpwentz
With the latest data from CJIS, this has been fixed
Year | Agencies | NIBRS Agencies |
---|---|---|
2016 | 19296 | 7420 |
2015 | 19319 | 7262 |
2014 | 18660 | 6960 |
2013 | 18327 | 6649 |
yay!
To follow up, I have rebuilt the following downloads agencies.csv
and ucr_participation.csv
and uploaded to S3. We could potentially rebuild all the NIBRS bulk downloads and reupload, but not as urgent, since those don't cover 2015 and 2016 yet.
So, for some reason, it seems like almost all agencies do not have a full 12 months of NIBRS reporting in 2015. This might be a bug or a data integrity issue or something else, but the effect is quite dramatic