Computational-Content-Analysis-2020 / frequently-asked-questions-spring

Questions or doubts about organisation/code for Spring 2020 running of Computational Content Analysis.
0 stars 0 forks source link

Mturk - different workers across assignments #24

Open DSharm opened 4 years ago

DSharm commented 4 years ago

Hi - I set up a document tagging task on Mturk and I wanted to understand how to reconcile the results.

I'm analyzing use of force policies by police departments, and I wanted each document to be tagged for having certain rules / guidelines (e.g. does the policy ban chokeholds?). I chose 3 such rules. So, my MTurk task had 6 assignments (6 documents), and each assignment has 3 tags. I also asked for 3 unique Workers per assignment.

What I didn't realize that I would not necessarily get the same 3 workers for each assignment (document). So, across my 18 data points (6 documents x 3 tags), I have several different unique workers.

My question is: when analyzing coder reliability a constructing the analysis in the week 2 homework, is this a problem? Does that analysis assume that 3 separate coders coded every tag in every assignment?

bhargavvader commented 4 years ago

Yes, this would be a problem... it would be difficult to make any judgements about the annotators if there are a different set each time. You would want to ideally have all the documents in one assignment to ensure that it is the same worker.

I'll also see what @HyunkuKwon has to say about this.

DSharm commented 4 years ago

Agreed - i've worked through the problem set and its hard to reach any conclusions given that I have 16 coders with 3 data points each. That said, the example given in the homework (with the loop design) seems to have a similar set up - where each chunk has 4 annotators and a total of 8 annotators. In my case, each "chunk" has 3 annotators and a total of 16 annotators - I just have far fewer data points per annotator.

However, i don't think on MTurk I can limit to 3 unique coders across all the assignments, right? Additionally, I can't put all the documents into one assignment because each document is a pdf with an associated document_url, and for MTurk's input I upload a dataset of 6 "document_urls" which MTurk opens successively for each "assignment". Unfortunately, I don't think I can afford to re-do the MTurk survey for the purposes of the week2 homework (both in terms of time and cost)