CRANE-toolbox / analysis-pipelines

Project CRANE (Crisis Racism and Narrative Evaluation) aims to support researchers and anti-racist organisations that wish to use state-of-the-art text analysis algorithms to study how specific events impact online hate speech and racist narratives. CRANE Toolbox is a Python package: once installed, the tools in CRANE are available as functions that users can use in their Python programs or directly through their terminal. CRANE targets users with basic programming but no machine learning skills.
https://crane-toolbox.github.io
GNU Affero General Public License v3.0
12 stars 4 forks source link

Adding analysis feature: Classification #41

Open LaChapeliere opened 4 years ago

LaChapeliere commented 4 years ago

First step is research. We are looking for a way to label racist hateful speech in a tweet dataset. For each method, the implementation's accuracy should be evaluated in some way. The doc should suggest the best preprocessing parameters. The implementation should compute ratios for every day, and save the label for each tweet. See the old implementation of a classifier with keywords filter combined with Google Perspective API in the resiliency-challenge_legacy branch. It was terribly unreliable, though we did not formally measure accuracy.

LaChapeliere commented 4 years ago

This paper studies the race bias of hate speech detection algorithms: https://arxiv.org/abs/1905.12516 An issue to keep in mind.

LaChapeliere commented 4 years ago

To investigate: