ProjectSidewalk / sidewalk-quality-analysis

An analysis of Project Sidewalk user quality based on interaction logs
5 stars 3 forks source link

Filtering users based on number of validated labels #51

Open nch0w opened 5 years ago

nch0w commented 5 years ago

I ran some analysis of how filtering users based on number of validated labels affects the precision/recall of the classifier. As expected, precision generally increases when we increase the label threshold. Recall does not vary much.

This classifier incorporates James' suggestions (i.e. split users when training the label/accuracy classifier)

Screenshot_2019-08-15 JupyterLab Screenshot_2019-08-15 JupyterLab(1)

If we choose a minimum of 45 labels (which gives us 55 users), we get a precision of 0.87804878 and a recall of 0.8372093.