chengsoonong / crowdastro

Cross-identification of radio objects and host galaxies by applying machine learning on crowdsourced training labels.
MIT License
13 stars 1 forks source link

How good are individual labelers? #215

Closed chengsoonong closed 7 years ago

chengsoonong commented 8 years ago

For members of the crowd that have labelled many subjects, we can train logistic regression on these labels. This will result in a predictor that mimics that particular annotator. Are any of these predictors particularly good/accurate?

MatthewJA commented 7 years ago

image

Mean is solid red, stdev in dashed red. Mean of LR(Volunteer) across all volunteers is basically the same as the mean accuracy of the LR(RGZ MV) predictor, which is interesting. The average volunteer outperforms the average LR(Volunteer). Volunteer index is just the index of the volunteer when sorting by average accuracy on the given plot (and it's worth noting that the volunteer indices on the left plot do not correspond in general to the indices on the right plot).

chengsoonong commented 7 years ago

It looks like you somehow have a "regression to the mean" problem. Good volunteers don't generate good predictors, because there probably isn't enough labels to train.

It is probably interesting to have some sort of one off ranking of volunteers, and push it to the website. It might stimulate people to label better.