Preliminary Worker Evaluation Tasks

mbernst commented 6 years ago

Problem

Requesters have multiple times requested the ability to filter workers based on early performance. This practice of "starter tasks" is a common one: workers' performance on the first few tasks qualifies them to gain access to the rest of the work. Gold standard tasks are a related technique. Concretely, both enable a requester to pitch a subset of their data as training tasks for new workers. Either these have known answers (like gold tasks), or can be manually reviewed by the requester, to gate people into their full task. Both approaches seem like they could be integrated into Boomerang.

Proposal

This proposal suggests that we begin prototyping starter tasks to see if they would help requesters. We would need to answer questions like: is there a way to do this without a lot of extra work on the requester's part? Does this bring substantial benefit to requesters, enough to be worth the time and effort of designing and implementing it? If it proves helpful, we would integrate it into the platform. If not, we could set the idea aside for now. The path through executing this strategic proposal will be determined by our findings.

Implications

Short-term implications: we would gain insight into whether non-expert requesters really benefit from the ability to create starter tasks, or if this is better served by other mechanisms such as prototype tasks.

Long-term implications: This might add to the complexity of the authoring interface. The question will be whether the improved experience is worth that complexity.

Contact

@Michael Bernstein on Slack

To officially join in, add yourself as an assignee to the proposal. To break consensus, comment using this template. To find out more about this process, read the how-to.

neilthemathguy commented 6 years ago

Could you provide more details:

Prototype tasks exist because requesters can be at fault--- they launch unclear task designs that confuse even earnest workers, under-specify edge cases, and neglect to include examples. Gaikwad et al 2017.

How do we know that these starter tasks will not lead to poor results?

mbernst commented 6 years ago

Yes, the requester need here might be addressed through prototype tasks, or might not --- I am not yet convinced either way.

The details you are asking for, I think:

Requester puts task on Daemo
Task goes through prototype process, succeeds and launches
Workers from across Daemo now do the task

So once we've hit Stage 3 above and the task is fully live on Daemo, the challenge is that a new worker can arrive without ever having done the prototype task. Requesters desire ways to conditionally filter workers into the task based on performance.

One option would be to have all new workers to the task go through the prototype first. Now it's not exactly what prototype tasks were designed for, but it would match research on putting all interviews through the exact same script before making a decision.

If we do this, I would suggest not calling it a prototype task, because what's being described here is the same as prior research (e.g., gold standard tasks), whereas prototype tasks are novel. Adjusting the title to reflect that. We could instead point out how it integrates with prototype tasks, or something like that?

Sorry, that was rambly. Please ask for more precision on anything where I'm confusing.

crowdresearch / collective