automan-lang / AutoMan

Human-Computation Runtime
https://docs.automanlang.org
GNU General Public License v2.0
47 stars 12 forks source link

New disqualification mechanism #24

Closed dbarowy closed 9 years ago

dbarowy commented 9 years ago

The original AutoMan system used a disqualification system with the MTurk backend that works as follows:

  1. When a HIT is initially posted, it is posted with no qualifications. Assuming that AutoMan's optimism works out (i.e., all workers agree unanimously in the first round of tasks), this scheme ensures that workers can participate with a very low barrier of entry: they need not satisfy any qualifications. Any worker who participates in this task is automatically granted a "disqualification" qualification upon submission of their work. This disqualification stays in place until the task is complete.
  2. When subsequent rounds are necessary (i.e., workers did not agree unanimously), AutoMan adds a qualification to the new round. This qualification checks to see if any workers received a "disqualification". If a worker has a disqualification from a previous round, when a worker requests a qualification for the new round, they are denied access. If they do not have a disqualification, they are granted access, but upon submitting the task, they are disqualified from subsequent participation.

This mechanism ensures that workers may only participate once per HIT, but it has two downsides:

  1. Since qualifications are HIT-specific, this breaks MTurk's HIT-batching mechanism. Normally, HITs with either a set of matching fields (title, description, reward IIRC) or a matching hit_type_id are batched together. From the worker's standpoint, this means that a single HIT appears to have many "assignments", even though from our (i.e., AutoMan's) perspective, these are in fact distinct HITs. This allows us to write programs that repeatedly call the same AutoMan function in a loop and have all of these HITs batched together. HITs with different qualifications cannot be batched, thus they break.
  2. Workers dislike applying for qualifications, even when getting them are easy. Their hesitation can be outweighed by the availability of many assignments for a given qualification. Since disqualifications are per-HIT, workers get access at most one assignment per disqualification qualification. This is clearly suboptimal.

Over the summer, I implemented an alternative mechanism. This mechanism uses no qualifications whatsoever, and instead discards work post-hoc. Workers are still paid depending on whether their work agrees with the crowd (assuming dont_reject != true), so they can be paid multiple times. This works well when we schedule large numbers of tasks that become large batched HITs on MTurk. But it becomes problematic when there are few HITs in a batch; when disagreements happen, workers are frequently offered the same HIT over and over again.

I plan to switch to the following new disqualification mechanism:

  1. There will be a single disqualification data structure for a single AutoMan function. Internally, instead of using a boolean flag, the flag will be an Int. If the AutoMan function is called repeatedly (e.g., in a loop), the runtime will associate all posted HITs (i.e., all HITs with the same hit_type_id) with the same qualification data structure.
  2. Users will need to request a qualification for all AutoMan HITs. But since all HITs associated with the same function get the same qualification data structure, users will be granted access to the whole batch when it is posted.
  3. The value of the Int flag in the qualification requirement corresponds to the "round" number. Users may only participate in batches where their round number matches.
  4. There is a corner case here, for tasks that "dribble" onto MTurk. I.e., where an AutoMan function is called repeatedly, but not in a tight loop. It will then be the case that some tasks posted later will have lower round numbers than tasks posted earlier. They will then appear in separate batches. To deal with this, round numbers for any given function will be monotonically increasing. Whenever a new task is posted, it will take the highest round number. While some batch fragmentation will still happen, it should not happen frequently, and not nearly as severely as it does now.
etosch commented 9 years ago

http://mechanicalturk.typepad.com/blog/2014/07/new-qualification-comparators-add-greater-flexibility-to-qualifications-.html

Emma Tosch MS/PhD student, University of Massachusetts Amherst http://cs.umass.edu/~etosch etosch@cs.umass.edu

On Feb 5, 2015, at 6:10 PM, Dan Barowy notifications@github.com wrote:

The original AutoMan system used a disqualification system with the MTurk backend that works as follows:

  1. When a HIT is initially posted, it is posted with no qualifications. Assuming that AutoMan's optimism works out (i.e., all workers agree unanimously in the first round of tasks), this scheme ensures that workers can participate with a very low barrier of entry: they need not satisfy any qualifications. Any worker who participates in this task is automatically granted a "disqualification" qualification upon submission of their work. This disqualification stays in place until the task is complete.
  2. When subsequent rounds are necessary (i.e., workers did not agree unanimously), AutoMan adds a qualification to the new round. This qualification checks to see if any workers received a "disqualification". If a worker has a disqualification from a previous round, when a worker requests a qualification for the new round, they are denied access. If they do not have a disqualification, they are granted access, but upon submitting the task, they are disqualified from subsequent participation.

This mechanism ensures that workers may only participate once per HIT, but it has two downsides:

  1. Since qualifications are HIT-specific, this breaks MTurk's HIT-batching mechanism. Normally, HITs with either a set of matching fields (title, description, reward IIRC) or a matching hit_type_id are batched together. From the worker's standpoint, this means that a single HIT appears to have many "assignments", even though from our (i.e., AutoMan's) perspective, these are in fact distinct HITs. This allows us to write programs that repeatedly call the same AutoMan function in a loop and have all of these HITs batched together. HITs with different qualifications cannot be batched, thus they break.
  2. Workers dislike applying for qualifications, even when getting them are easy. Their hesitation can be outweighed by the availability of many assignments for a given qualification. Since disqualifications are per-HIT, workers get access at most one assignment per disqualification qualification. This is clearly suboptimal.

Over the summer, I implemented an alternative mechanism. This mechanism uses no qualifications whatsoever, and instead discards work post-hoc. Workers are still paid depending on whether their work agrees with the crowd (assuming dont_reject != true), so they can be paid multiple times. This works well when we schedule large numbers of tasks that become large batched HITs on MTurk. But it becomes problematic when there are few HITs in a batch; when disagreements happen, workers are frequently offered the same HIT over and over again.

I plan to switch to the following new disqualification mechanism:

There will be a single disqualification data structure for a single AutoMan function. Internally, instead of using a boolean flag, the flag will be an Int. If the AutoMan function is called repeatedly (e.g., in a loop), the runtime will associate all posted HITs (i.e., all HITs with the same hit_type_id) with the same qualification data structure. Users will need to request a qualification for all AutoMan HITs. But since all HITs associated with the same function get the same qualification data structure, users will be granted access to the whole batch when it is posted. The value of the Int flag in the qualification requirement corresponds to the "round" number. Users may only participate in batches where their round number matches. There is a corner case here, for tasks that "dribble" onto MTurk. I.e., where an AutoMan function is called repeatedly, but not in a tight loop. It will then be the case that some tasks posted later will have lower round numbers than tasks posted earlier. They will then appear in separate batches. To deal with this, round numbers for any given function will be monotonically increasing. Whenever a new task is posted, it will take the highest round number. While some batch fragmentation will still happen, it should not happen frequently, and not nearly as severely as it does now. — Reply to this email directly or view it on GitHub.

dbarowy commented 9 years ago

Yeah, I saw those. Thanks. There may be some cleverer way to do this using the new operators, but if so, it's eluding me at the moment.

dbarowy commented 9 years ago

To clarify how this works, basically, batches are assigned to worker pools. Workers are assigned to a pool when they request a qualification. If we use all of the available IntegerValue fields available in a QualificationRequirement, we can have up to 15 pools. If a task is rescheduled more than 15 times (which ought to be extremely unlikely and very expensive), then we can just create a new set of pools with a new QualificationRequirement.

A few of interesting things can happen as a result of this:

  1. When scheduling a task, we should prefer existing pools over creating new ones. The only reason a task cannot be assigned to a particular pool is if a prior round was assigned to that pool.
  2. We could favor the most active pool, where we define "most active" as having the largest number of workers, weighted by how recently they participated.
  3. Pools can be completely independent of the task/AutoMan function that gets assigned to them; i.e., different functions can assign work to the same pool.
dbarowy commented 9 years ago

In the end, I went with something much simpler. As before, no qualification is required for the first HITGroup. HITGroups correspond to quality control rounds. Upon completing a HIT in the first HITGroup, a disqualification is issued that says that the worker's only valid group = 1. Subsequent rounds (HITGroups) require a qualification where the valid group = round #. In other words, qualifications always ensure that a worker can only participate in a given batch. Some workers will not be able to supply answers for which they are actually qualified, but this new mechanism is simpler, and it prevents HITs from being fragmented into their own unique HITGroups. Finished in 1f742c9d3838fcd0041d284ab7b8cb3f80b57a23.