2020PB / police-brutality

Repository containing evidence of police brutality during the 2020 George Floyd protests
MIT License
2.62k stars 209 forks source link

Adapting incidents reports for quantitative analysis #677

Open lexeree opened 4 years ago

lexeree commented 4 years ago

Hey, this is an amazing project and I think there are a lot of possibilities for how to use this data. However, I've noticed that incident reports don't seem to be recorded in a way that facilitates a quantitative analysis of the data.

Looking at amalgamated data from different data sets or providing methods to generate analytics/visualizations could be really helpful, and I'd be happy to code them myself...but more data would be needed.

Specifically, I think that whenever possible, it would be great if the following data points could be parsed out from the incident reports:

I realize that some of this data can be extracted from the tags, but they may not always be applied consistently or may be too granular/not granular enough for certain analyses.

Thanks!

ContributingThrowaway commented 4 years ago

The vast vast vast majority of these occur at, around, or shortly after protests.

Why do you need to know whether the incident was reported on by media? (I ask because often when it was, the relevant coverage is linked; it might be possible to extract relevant coverage from the links.)

I agree that information like numbers of people injured, killed, arrested, etc would be very helpful, but I suspect it will be extremely hard to extract from many (most?) incidents. Some are such that it's not clear what would even count as being injured, killed, or arrested in the course of them. (There is a tag for the rare cases where an individual dies in a given incident.) Perhaps we could add a tag for serious injuries? (I'd also like to see a "wrongful-arrest" tag -- the "arrest" tag is used both when a wrongful arrest is conducted and when a reasonable arrest is conducted using excessive force.)

I'd be very interested down the road to see how many of these are actually dealt with by police. I'd also like some way of marking severity of incidents -- they range from slightly-careless tear-gassing to five officers pinning a man down and beating him senseless for no good reason.

lexeree commented 4 years ago

I personally would like to know about media coverage because an issue I have dealt with when speaking to people about police brutality in the protests is "well I haven't seen that much about it!" I think it would be helpful to know how much is covered in national news. The idea about the links is a good idea - I could definitely parse out the site name and compare that against a list...but there's also the possibility that incidents reported in local news could be picked up by larger networks at a later date.

I agree that numbers are difficult for number of people injured, but maybe we could make a web scraper to look for any new information published about incidents, or volunteers could check for updates? Either way, I just think this project is really great, but is limited in use unless we can efficiently extract analytics - being able to use social media posts about police abuse and comparison to media coverage etc could be a very useful tool for being able to give solid numbers regarding the reality of abuse of power among authorities in the US

ed42311 commented 4 years ago

@happy-lambdas a few points of clarity here are you talking about a standardized format or just adding more fields for specific sets of information? For instance there is a description with each set a video links and a title. Although maybe scenario description is something more specific.

This is a possibility, with some research ( unless the linked video is a news outlet )

These are more difficult, with the possible exception of people killed. Injuries and arrests would be based on estimations, I guess we could record that with some explicit margin of error, fact checked against recordings.

This would be great, and I have to imagine that many of these incidents are stored in the public record, with a little bit of legwork we might be able to link the police reports with the incidents. That sounds like a great idea ( and a lot of work :D )

Also I'm on board with some sort of rating system @ContributingThrowaway or maybe a binary. Was there brutality or not? Not sure what to base the metric on if we are going with a scale. The Bethel, Ohio protests seem to me like a decent talking point. There was brutality ( although I did not see recorded instances of police brutality ), and the police did not engage at times, but for the most they tried to act as intermediaries while keeping the peace ( this is an observation as a third party, with minimal context)

ContributingThrowaway commented 4 years ago

My first thought is to tag with many severity-relevant characteristics, such that researchers and frontend developers could come up with their own indices for severity -- +5 for improper use of rubber bullets, +6 for a wrongful arrest, etc. Possible tags include:

Alternatively, we could just introduce three tags -- high-severity, medium-severity, and low-severity -- and gradually build consensus as to which should be used when. That wouldn't allow the same granular analysis, but it would at least help with filtering out the least severe incidents or focusing on the most severe.

What's the process for adding new tags?