Models for machine classifications - Githubissues

chicago-justice-project / chicago-justice

Chicago Justice Project

GNU General Public License v3.0

14 stars 11 forks source link

Models for machine classifications #14

Open meshulam opened 7 years ago

meshulam commented 7 years ago

When we run the classification algo on the full dataset, we probably don't want to modify the article records directly. Especially if human coders have already classified it.

Idea: derived metadata (categories, relevant/not, location) should live in its own related table, which includes fields for how it was derived (human or algo). Could there be multiple classifications for an article, and choose which one to use?

meshulam commented 7 years ago

After conversation today:

Article

has one: LearnedData
has many: EnteredData
reviewed boolean (just whether an EnteredData entry exists?)

LearnedData

probabilities for each category
relevance probability
overall confidence 0-1 (calculated from above?)
model info string (version, date, hyperparameters)

EnteredData

user ID
boolean for each category
relevant/irrelevant
date

Article is responsible for figuring out the 'definitive' classification based on what data is associated with it