Closed davidpomerenke closed 6 months ago
@davidpomerenke I looked through your code and saw that you only query the assoc_actor_1
field but not the assoc_actor_2
which can also yield matches for events co-organized by multiple protest groups.
Good reminder that we need to document sampling criteria stuff like this well and should probably add some kind of documentation page to the final website for scholarly rigor 📚
Here's a sample from ACLED from protests in the UK, only with entries where assoc_actor_2
is present:
It typically denotes the opposed party of the protest (e.g. antiracist groups when it is a far-right protest). So the protests are actually counter-protests to the assoc_actor_2
group, and I think we don't want to have them when we query for that group.
However, there is a second meaning of the field: That there is a demo by assoc_actor_1
, and assoc_actor_2
further disrupts it. These again we would want. E. g.:
{'date': Timestamp('2023-07-01 00:00:00'), 'assoc_actor_1': 'LGBTQ+ (United Kingdom)', 'assoc_actor_2': 'Just Stop Oil', 'notes': "On 1 July 2023, around 30,000 LGBTQ+ community members and their supporters marched from Hyde Park and Whitehall in London - Westminster (England) as part of the annual Pride March to celebrate diversity and the queer community and denounce discrimination against LGBTQ+ people. At around 13.30, a small group of protesters from Just Stop Oil disrupted the pride march by sitting down on the road, stopping the parade, over 'high-polluting' corporations that are sponsors the pride. Police Forces intervened and arrested seven Just Stop Oil protesters and the parade continued."},
But I would rather go for avoiding false positives, and not query by the assoc_actor_2
field.
Documentation is a good idea 📚👍 These are specific to the data loaders, so maybe we can keep the documentation text in the same file or in the same folder as the respective data loaders, and then eventually also display them in the frontend. (Rather then storing them somewhere in the frontend, detached from the code.)
Some stats about assoc_actor_2
:
assoc_actor_2
assoc_actor_2
assoc_actor_2
assoc_actor_2
df["notes"].str.contains(r"climate|oil|extinction|future", regex=True, case=False)
there are 1343 results in the UK, and only 1 (the one above) where a climate group occurs as assoc_actor_2
but not as assoc_actor_1
. There are no cases of a protest against a climate group.This might look different for other countries though, since the codebook does not give explicit instructions, and there's different region teams.
Great, thanks for looking into this and the clarification!
We can add some of your notes to the docs #35
maybe we can keep the documentation text in the same file or in the same folder as the respective data loaders, and then eventually also display them in the frontend.
I'm thinking about which tool might be able to facilitate that 🤔
Closed with #28
Regarding tool: We could store the docs as a Python string, or read them with Python from a README file, and serve them via API to the frontend.
Though maybe this is overkill and we can just have it in the frontend 😄
We already have code for: