Filtering users based on number of validated labels

I ran some analysis of how filtering users based on number of validated labels affects the precision/recall of the classifier. As expected, precision generally increases when we increase the label threshold. Recall does not vary much.

This classifier incorporates James' suggestions (i.e. split users when training the label/accuracy classifier)

Screenshot_2019-08-15 JupyterLab

If we choose a minimum of 45 labels (which gives us 55 users), we get a precision of 0.87804878 and a recall of 0.8372093.

ProjectSidewalk / sidewalk-quality-analysis

Filtering users based on number of validated labels #51