Sampling, Crowd-Sourcing & Reliability - Fundamentals

HyunkuKwon commented 3 years ago

Post questions about the following fundamentals reading here:

Dawid, A. P., and Skene, A. M. 1979. “Maximum Likelihood Estimation of Observer Error-rates using the EM Algorithm.” Applied Statistics 28(1): 20-28.

Mechanical Turk tutorial on Canvas (essential for performing the assignment if you haven't used it before.)

jacyanthis commented 3 years ago

What do you think about bots on Mechanical Turk? I've heard some people argue more than 20% of M Turkers are bots now, or at least automation-assisted humans, and they are capable of passing all sorts of attention checks. Others think it's a much smaller issue mostly of IP spoofing, and spacing out multiple attention checks and manually reading free text (e.g. removing "GOOD" comments) can essentially weed out any funny business. It seems like every month there's a new analysis, and researchers are quite divided.

toecn commented 3 years ago

There seems to be a lot of potential in combining supervised learning and manual codding using systems like Mechanical Turk--although, challenges like the ones @jacyanthis mentions suggest that we would also need to take a sample of the sample that the coders did to verify the quality of the manual codding. I wonder what are some of the most interesting papers that have combined supervised learning and manual codding.

jinfei1125 commented 3 years ago

In the paper Maximum Likelihood Estimation of Observer Error-rates using the EM algorithm, the author gave an example of medical records error. Can you give us more examples of this EM algorithm's application in social science? Thank you!

Bin-ary-Li commented 3 years ago

The slides only go over how to build survey-based HIT. What if we want to make something fancier like a web game or interactive experiment? I am sure AWS has API for that, but I wonder if there is an easy-to-use programming library that integrates that?

egemenpamukcu commented 3 years ago

Would it make sense to use crowdsourcing platforms such as Mechanical Turk as a source of validation for, let's say, unsupervised classification algorithms? What would be the pros and cons of doing classification via Turkers and through ML algorithms? What to do when they point to different directions?

hesongrun commented 3 years ago

Thanks for the reading! I find the EM algorithm paper very interesting. The authors adopt the clever EM algorithm to estimate the MAP estimates of people's observer error rates even if 'true' response is not available. This really harnesses the great power of EM algorithm to uncover clusters with mixture of probability distributions even without labels. It is also very robust to missing values. I think this approach provides a parsimonious way to infer latent state from people's responses to environment even if true labels are not available.

Relating to @jinfei1125's question, in what case do you think it is overly restrictive for us to use EM to draw inferences for the underlying latent states in social sciences? Thanks!

william-wei-zhu commented 3 years ago

Mechanical Turk is very useful for straightforward tasks on coding and preparing data. For tasks that require more background knowledge but repetitive and time consuming (e.g. preparing dataset of Fortune 500 CEO's previous employment history by looking on wikipedia), what resources similar to Mechanical Turk can we turn to?

theoevans1 commented 3 years ago

In addition to bots, Mechanical Turk has a risk of respondents answering randomly or trying to get through questions as quickly as possible. Are there any strategies to reduce that issue in formulating survey questions or setting up surveys?

mingtao-gao commented 3 years ago

Can you provide some examples of how we can apply EM algorithm in content analysis, or other real-life application of this methodology?

k-partha commented 3 years ago

The EM algorithm was a highly interesting read (this picture explains it concisely. To my understanding, however, we still need to describe the statistical structure of the underlying distribution (latent variable) to some degree to obtain a solution (for example, in a Gaussian mixture model, we have to specify the number of random normal variables). While an infinite number of variables can theoretically explain the data we see, are there are any statistical model selection metrics (unsupervised analogues to BIC and AIC) that optimize the number of the unseen random variables generating the data? (Forgive me if I missed something obvious - it is fairly mathematically dense).

yushiouwillylin commented 3 years ago

I am wondering if there are any common errors or problems that occur when implementing EM algorithm? I believe that any machine learning methods, there will exist problems that it find hard to handle. Therefore, are there any specific of cases in social science that EM algorithm find hard to handle? Or more generally, are there any rule of thumb to which method should be implemented in some certain categories?

romanticmonkey commented 3 years ago

Might the Amazon Turk population be inherently biased, i.e., not a good representation of the desired population? I've encountered many studies where the authors take for granted that the Turkers are a good sample of, say, the US population. However, the people who would go on Amazon Turk might come from a very niche group of the US population.

The sampling strategy I can think of for this problem is stratified sampling, where we can use Turker's demographics to sample evenly from the collected responses. Nevertheless, it still worries me that even after the stratified sampling, the sample population is still biased toward, e.g., those who want to make some small cash on Amazon Turk.

chuqingzhao commented 3 years ago

Thank you for the papers. EM algorithm is an interesting method to estimate the unobserved variables. I am wondering how does EM algorithm can be applied into content sampling?

ming-cui commented 3 years ago

Can the algorithm in the paper improve the reliability of content analysis? I am wondering how this paper matches the topic of this week.

MOTOKU666 commented 3 years ago

I'm wondering how good the estimation would be. 99% and 95% are different, then 99.9% and 99% are also quite different if we want to apply this to the medical or diagnositic area.

zshibing1 commented 3 years ago

Is it useful to obtain estimates like confidence intervals for the results from maximum likelihood estimation?

Raychanan commented 3 years ago

Have you done any research with the help of MTurk? If so, can you please talk about your experience on MTurk? Thanks!

Rui-echo-Pan commented 3 years ago

Could you give some methods that would improve the correctness of MTurk processing? And I think when MTurk is used for perception rating (which is commonly used for, I suppose), there can always be a bias, as MTurk workers tend to be a specific group of people. What should we do to solve the problem?

xxicheng commented 3 years ago

I also have a question about the validity of experiments or surveys with MTurkers. As we know, some of them are over-exposed to academic surveys due to the monetary compensation, and social scientists tend to use the same or similar questionnaires to compare their results with previous studies. Therefore, some MTurkers are more aware that their behaviors are being observed, and their answers will be analyzed. Do you think this will be a significant disadvantage for us to conduct researches with MTurkers?

RobertoBarrosoLuque commented 3 years ago

If a researcher is interested in understanding the demographics/other characteristics of their respondents on Mechanical Turk how would one go about doing that? There is obviously a privacy issue on Amazon disclosing that information but on the other hand researchers might want to gauge how representative their sample is of the population they are interested in studying.

lilygrier commented 3 years ago

Is there a minimum level of samples and indicators that should be present to assert MLE will work well in filling in missing data? In a past machine learning class, we learned about filling in missing data using matrix approximation, and it was the case that estimates would better converge when there were more samples to which the missing samples were similar. I imagine that holds here too, but I'd be interested to learn more!

sabinahartnett commented 3 years ago

What kind of research has been done on 'optimal' MTurk assignments? i.e. avoiding some of the potential pitfalls specified above (how much should we cross validate within assignments? should a MTurk users' rating be enough? To what degree can you specify 'expertise' for the MTurkers?)

UChicago-CCA-2021 / Readings-Responses

Sampling, Crowd-Sourcing & Reliability - Fundamentals #12