Closed chengsoonong closed 7 years ago
Mean is solid red, stdev in dashed red. Mean of LR(Volunteer) across all volunteers is basically the same as the mean accuracy of the LR(RGZ MV) predictor, which is interesting. The average volunteer outperforms the average LR(Volunteer). Volunteer index is just the index of the volunteer when sorting by average accuracy on the given plot (and it's worth noting that the volunteer indices on the left plot do not correspond in general to the indices on the right plot).
It looks like you somehow have a "regression to the mean" problem. Good volunteers don't generate good predictors, because there probably isn't enough labels to train.
It is probably interesting to have some sort of one off ranking of volunteers, and push it to the website. It might stimulate people to label better.
For members of the crowd that have labelled many subjects, we can train logistic regression on these labels. This will result in a predictor that mimics that particular annotator. Are any of these predictors particularly good/accurate?