TakeTwo aims to help mitigate bias in digital content, whether it is overt or subtle, with a focus on text across news articles, headlines, web pages, blogs, and even code. The solution is designed to leverage directories of inclusive terms compiled by trusted sources like the Inclusive Naming Initiative, which was co-founded by the Linux Foundation and CNCF. The terminology is categorized to train an AI model to enhance its accuracy overtime. TakeTwo is built using open source technologies including Python, FastAPI and Docker. The API can be run locally with a CouchDB backend database or IBM Cloudant database.
Technology has the power to drive action. And right now, a call to action is needed to eradicate racism. Black lives matter.
We recognize technology alone cannot fix hundreds of years of racial injustice and inequality. When we put it in the hands of the Black community and their supporters, technology can begin to bridge a gap; to start a dialogue; to identify areas where technology can help pave a road to progress.
This is one of several open source projects underway as part of the Call for Code for Racial Justice led by contributors from IBM and Red Hat.
Bias is learned and perpetuated in different ways (i.e., societal beliefs, misrepresentation, ignorance) that consequently create inequitable outcomes across all spheres of life.
This repository is part of the Embrace: Diverse Representation stream and our focus is on problem statement 3. We decided to focus on the following two predefined hills:
A media content editor (e.g., audio, gaming, movies, tv, comics, news, publications) can incorporate bias detection and remediation into their creative process to reduce racial bias and improve representation to Gen Z.
A social media user can understand the historical and societal context of racial bias and cultural appropriation reflected in their posts in real time.
We have identified the following issues currently faced by content platforms:
This project aims to facilitate content platforms to:
The TakeTwo solution provides a quick and simple tool for content platforms to detect and eliminate racial bias (both overt and subtle) from their content.
TakeTwo is an API that can be used while you compose social media text, paragraphs, essays, and papers. TakeTwo will scan for potentially racially biased language. The API works by flagging and classifying phrases and words that have a tendency of being perceived as racially biased within the United States. These phrases and words are then catagorized by common types of detectable racially biased language.
TakeTwo leverages a crowd-sourced database of words and phrases that could be viewed as racially biased in the US. Verified, trusted contributors can use TakeTwo's browser extension to select potentially biased language in text-based media. These selections are classified under commonly detected types of racially biased language to train TakeTwo's text-classification machine learning model.
TakeTwo's machine learning model is used to help identify subtle, context-dependent phrases or words that may be perceived as racially biased in the United States.
Users of the API and browser extension can provide feedback on the value of the recommendations provided so that the AI model can be steadily improved and refined over-time.
This API is underpinned by a crowd-sourced database of words and phrases that are deemed racially biased. These phrases are categorized in order to train an AI model on the significance of the context in which the language was used. Contributors to the project can be part of the crowdsourcing process by installing a browser extension. This API repo is part of the data capture process, which is used for modeling.
There are a number of other repositories related to this project:
TakeTwo is built using open source technologies. The API is built using Python, FastAPI, and Docker (if running on a Kubernetes cluster).
The racially biased terms are vetted and loaded into a backend database. The code is set up to run the API locally with a CouchDB backend database or IBM Cloudant database.
To run with CouchDB, you will need to deploy a CouchDB docker image either locally or on a Kubernetes cluster.
There is a front-end HTML page that serves as an example text editor.
This project has defined a number of data scheme categories of racial bias, which are used by a text classification model (outlined below). We welcome feedback on these categories.
The Web API is built in Python and handles the following:
Follow these instructions for setting up the web back end API
The TakeTwo Chrome javascript extension uses the Highlighter Chrome extension library to allow the highlighter functionality for selecting text.
The TakeTwo Chrome browser extension is a plugin to facilitate the capture and categorization words and phrases that could be racially biased through a browser. The words and phrases can be categorized.
This extension is used to enable the crowdsourcing of data for use in training an ML model. This extension aims to make it as easy as possible for community members who would like to contribute to this initiative to do so quickly and privately.
Follow these instructions for installing the Chrome Extension
The TakeTwo Data Science workstream uses data crowdsourced by a Chrome extension. The data is sent to a backend database.
The machine learning model code is written in Python and runs in a Jupyter notebook.
To build and use the TakeTwo solution:
Get started by cloning the TakeTwo web API repository, and follow the instructions to build and run the FastAPI server.
Next, clone the TakeTwo Chrome extension repository, and follow the instructions to build the Chrome extension.
Finally, explore the TakeTwo data science workstream repository to learn more about the data science model.
We welcome contributions! For details on how to contribute, please read the CONTRIBUTING file in this repo.
This project is still very much a work in progress. Our hope for the future is that this is a step towards a more informed media culture that is more aware of racial bias in media content. We hope this can be built out for use in a range of areas: news, social media, forums, code, etc.
We also hope to expand the project to enable detection of racial bias in audio and video.
We hope you will help us in this open source community effort!
This solution starter is made available under the Apache 2 License.