Call-for-Code-for-Racial-Justice / TakeTwo-DataScience

Call for Code Diverse Representation Problem 3 media bias data science
Apache License 2.0
8 stars 8 forks source link
hacktoberfest

License Community Hacktoberfest

TakeTwo Solution Starter - Data Science

The Call for Code for Racial Justice TakeTwo machine learning and datascience component uses data crowdsourced by a Chrome extension and sent to a backend database.

Technology Used

The machine learning model code is written in Python and runs in a Jupyter notebook.

Description of TakeTwo Data Science

This repo contains the code for building a machine learning model to predict whether a word or phrase contains racial bias and, if so, predict the category of racial bias.

Data

The model uses data from a backend database, populated by crowdsourcing.

Initially the backend database is empty and the open-source community is welcome to take on ownership and stewardship of the data.

The fields used by the model are:

Data Science Evolution

The vision is for the module to contain an evolving set of versions, with various degrees of sophistication, for the DS/ML component of the solution. Currently, the repository contains DS-MVP-0 and some work towards DS-MVP-1.

The overall goal of the DS/ML component is to use machine learning on text data to detect racially biased expressions and usage in context.

It will use input labeled data collected through crowdsourcing, enabled by the MVP1 browser extension ("Marker"), and train an ML model to classify text. This data will be used by the MVP2 plug-in on a content editor ("Flagger") to flag racially biased text input to the editor.

Below are a possible series of capabilities (refered to here as DS-MVPs) that may be developed and included in this component:

Related Links

There are a number of other components related to this project:

License

This solution starter is made available under the Apache 2 License.