Call-for-Code-for-Racial-Justice / TakeTwo-DataScience

Call for Code Diverse Representation Problem 3 media bias data science
Apache License 2.0
8 stars 8 forks source link

Implement Machine Learning component V5 (dsmvp-v5) #12

Open naokiabe opened 3 years ago

naokiabe commented 3 years ago

As part of the progression of machine learning components with increasing levels of sophistication, implement version 5 ("dsmvp-v5") with the following characteristics:

Active Learning: (to be documented) An on-line active learning module that can learn to detect racially biased expressions and to actively solicit labeled data from selected markers (based on the estimated credibility of the markers), based on labeled data of <expression, classification, marker-ID> triples.

A possible implementation of this version may make use of various versions of "bandit algorithms," which dictate how to choose the markers to sample from next. An example of such an algorithm is the UCB (Upper Confidence Bound) method, which chooses the marker according to who has the highest "upper confidence bound" among all the markers, balancing the motivation to learn from the most credible v.s. the need to learn from fresh new markers so as to learn about their credibility. (Reference: https://tor-lattimore.com/downloads/book/book.pdf)

Coding of dsmvp-v5 should be similar to and share many aspects of how dsmvp-v1 in the repository is implemented, using Jupyter notebook and accessing the database via webapi, etc.

github-actions[bot] commented 2 years ago

:wave: Hi! This issue has been marked stale due to inactivity. If no further activity occurs, it will automatically be closed in 14 days.