Police-Data-Accessibility-Project / data-source-identification

Scripts for labeling relevant URLs as Data Sources.
MIT License
5 stars 6 forks source link

test: crawling for misconduct data #69

Open josh-chamberlain opened 5 months ago

josh-chamberlain commented 5 months ago

Context

Essentially, we want to collect data on complaints, agency size, city size, and insurance coverage separated by year.

We got a request from an academic researcher looking to collect data from as many jurisdictions as possible. They are trying to connect the dots between misconduct, insurance, and the taxpayer burden.

Notes

we don't need to collect the data just yet; finding it is its own task.

most of these tools are informal/in development—feel free to ask questions or suggest better ways to do stuff.

Task

so, I recommend:

More details from requestor

Complaints

The more information the better here, preferably from the year 2000 onwards. We are most interested in total citizen complaints, total sustained citizen complaints, total use of force complaints, and total sustained use of force complaints. It’s important to note that some departments group together citizen (external) and departmental (internal) complaints in their reports. We are interested in collecting as much data as we can upfront, so total complaints and total sustained complaints may also be helpful, especially when we can discern which are external and which are internal. Total settlement cost is also helpful information, but may be more difficult to find. https://home.chicagopolice.org/statistics-data/data-dashboards/accountability-dashboard-2/ OR https://www.icpsr.umich.edu/web/NACJD/studies/38651 (LEMAS)

Size of Agency

Measured by total number of sworn officers. https://www.icpsr.umich.edu/web/NACJD/studies/38651 (LEMAS)

Size of City (out of PDAP scope, easy to find elsewhere)

Measured by city population. https://www.chicago.gov/city/en/about/facts.html

Insurance coverage

The source of this information will likely vary across agencies, but annual budget reports are a good place to start. Otherwise, searching liability coverage of xx city or liability insurance of xx city may lead you to some pdfs with municipal insurance information. Insurance coverages likely take one of three forms: self-insured, contributes to municipal risk pools, or externally insured (by some commercial provider). Some municipalities engage in some sort of hybrid model - this is fine as long as we make note of it. Because law enforcement agencies are govt funded, we are looking at the insurance information of the municipality to which they belong (e.g., insurance coverage of Chicago, not insurance coverage of Chicago PD). Bottom of pg. 104: https://www.chicago.gov/content/dam/city/depts/fin/supp_info/CAFR/2022CAFR/ACFR_2022.pdf OR https://ipi.cityofchicago.org/Reporting/?MQy3cM%2BPHK8prsUtdjFlFgdV2AS9%2FN8As3rOpBWJp13UkPLfM%2FQA7U6UOBzfblTmI53gvYfsAHJxMvjGtIO6Ad%2BL5u8IPskvUdFdtKN%2BUfH9x3D7fN5I6CqjMQcS5iosW%2BQyTD87qiEE7n17ZI%2FeuBmPl%2BFQrvvrHiUO6%2FWh2wg%3D (not sure if this is exactly what we need?)

More background

I’m also including links to 4 articles that will help orient you and your team to this topic. The “How Governments Pay” article by Schwartz is most relevant to a framework for data collection/analysis, and the others (especially Rappaport’s “Typology” and “How Private Insurers…”) are more conceptual in nature. We wouldn’t expect you to read them all in full, but skimming them will be super helpful in getting a feel for the role of liability insurance in police misconduct. Rappaport, J. (2016). An Insurance-Based Typology of Police Misconduct. University of Chicago Public Law & Legal Theory Working Paper, 585). Rappaport, J. (2017). How private insurers regulate public police. Harvard Law Review, 130(6). Schwartz, J.C. (2016a). How Governments Pay: Lawsuits, Budgets, and Police Reform. UCLA Law Review, 1144(63), pp. 1144-1298. Schwartz, J.C. (2016b). Who Can Police the Police? University of Chicago Legal Forum, 2016(11), pp. 437-444).

maxachis commented 5 months ago

@josh-chamberlain What's the absolute latest they would need this information by? I'm happy to begin working on this, but if they need this by the end of next Tuesday or something that might be somewhat tight.

Additionally, I mentioned this elsewhere, but I was able to obtain census data reporting on the size of cities/counties/etc. So that could be implemented.

Setting up a labeling task for some of the urls might also be worthwhile, as even a few hundred of those could enable us to train an ML model.

maxachis commented 5 months ago

@josh-chamberlain Additionally, would it be useful for some of us to obtain (perhaps privately) the contact information of these researchers so that we could correspond with them more directly and further refine our efforts with their input?

josh-chamberlain commented 5 months ago

yeah, we could label for complaint misconduct budget/insurance (maybe something cleaner) or none.

They don't need it on a particular schedule—certainly not that quickly.

I'll ask with them about correspondence! @maxachis

sowdm commented 5 months ago

Not sure how many of these are already in PDAP's database but the following code can be used to find related datasets from OpenPoliceData's datasets:

import openpolicedata as opd

df = opd.datasets.query()  # Get all datasets
# Get all datasets for complaints and displinary actions
df = df[df['TableType'].str.contains("COMPLAINTS") | df['TableType'].str.contains("LAWSUIT") | df['TableType'].str.contains("DISCIPLINARY")]
# Get data for only unique agencies
df_unique = df.drop_duplicates(subset=['State', 'SourceName'])

print(df_unique[['SourceName','TableType']])
        SourceName                 TableType

119 Richmond COMPLAINTS 154 San Diego COMPLAINTS 201 Santa Rosa COMPLAINTS 263 Washington D.C. LAWSUITS 289 Chicago COMPLAINTS - BACKGROUND 315 Bloomington COMPLAINTS 322 Indianapolis COMPLAINTS 325 South Bend COMPLAINTS 452 New Orleans COMPLAINTS 479 Montgomery County COMPLAINTS 499 Massachusetts DISCIPLINARY RECORDS 533 Detroit COMPLAINTS 536 Lansing COMPLAINTS 541 Minneapolis LAWSUITS 638 Albany COMPLAINTS 651 New York City COMPLAINTS - ALLEGATIONS 705 Asheville COMPLAINTS 754 Cincinnati COMPLAINTS 779 Norman COMPLAINTS - ALLEGATIONS 877 Philadelphia COMPLAINTS - BACKGROUND 897 Chattanooga COMPLAINTS 1032 Seattle COMPLAINTS 1040 Tacoma COMPLAINTS 1049 Milwaukee COMPLAINTS

josh-chamberlain commented 5 months ago

Thanks @sowdm ! We have grabbed those before, but I made an issue to remind us to update periodically. I'll share OPD with the requestor specifically.

maxachis commented 4 months ago

I've included the below information previously in #53, and include it again here, with some modified commentary:

First step would be finding data detailing all the different police agencies and their population. I found a good start with the results of the 2018 Census of State and Local Law Enforcement Agencies (CSLLEA). I've also included the results of this census (for parsing in R) as a zip file:

ICPSR_38771-V1.zip

Additionally, here's the primary data, as an excel file:

2018 Police Agencies Census Data.xlsx

17721 rows, and there's quite a bit of detail in this: Not just the location of each agency, but also:

  1. Their 2018 estimated budget
  2. Their population
  3. The number of full time and part-time officers
  4. What type of agency they are (Sheriff's Department, tribal, college, etc.)
  5. What functionalities they served (for example, warrants, whether they investigated homicide, arson, cybercrime, whether they were used for crowd control)
  6. And more.

The full set of information can be described in the code book contained within the zip.

maxachis commented 4 months ago

Right now here's the current plan:

  1. Reconfigure the in-development Google Searcher to conduct searches for misconduct reports, using a query such as "{city} {state} police misconduct annual report"
  2. Iterate through these in order of either total population of city, total number of officers, or both.
  3. Upload the results for hugging face, where they can be inspected and/or processed.

We can also try different sorts of queries. A few possible things to search for include:

josh-chamberlain commented 4 months ago

@maxachis I'm on vacation and not looking in a detailed way, but this sounds good overall. My gut says to try "{city} {state} police misconduct records" or "...reports".