facebookresearch / Mephisto

A suite of tools for managing crowdsourcing tasks from the inception through to data packaging for research use.
https://mephisto.ai/
MIT License
303 stars 76 forks source link

Non-static preview with MTurk #708

Open eranhirs opened 2 years ago

eranhirs commented 2 years ago

Hi, I would appreciate your help again with another question I had.

In mturk, when in preview mode, you get the hitId which can be used to provide the turker with data about the specific hit. It is a pretty common user experience to see the specific hit data, and it is also mentioned in the mturk docs: When a Worker previews a HIT, your web page should show her everything she will need to do to complete the HIT, so she can decide whether or not to accept it. The easiest way to do this is to simply display the form as it would appear when the HIT is accepted.

It seems mephisto first registers the worker and agent before getting the task data, so it is currently impossible in preview mode to get the data when there is no assignmentId / workerId available in the url.

I understand that this use case might not be relevant for dialogue, is this a use case that mephisto should even support? If yes, how do you imagine it? Maybe I can help make the relevant changes. Thanks!

JackUrb commented 2 years ago

This is interesting, as we dynamically associate a HITID with a the data for a Unit upon acceptance: https://github.com/facebookresearch/Mephisto/blob/693026cbf5d5b9de82245d2fa82c59de09e0339c/mephisto/abstractions/providers/mturk/mturk_unit.py#L90

At the moment there's no clear way I can imagine changing this as one of the reasons we do this is because we allow for more complex eligibility matching that MTurk itself would allow. Thus it's only at match time that we know which data a HITID is referring to.

Implementation changes for this would touch the entire stack (CrowdProvider, Architects, the frontend Mephisto-task library, ClientIOHandler) and include needing an option to force the HITID on unit.launch to permanently pair data with an assignment, as well as exposing an endpoint for sending the data during the preview (which is currently static content served from the architect). It's a highly nontrivial task.

More possible could perhaps be the option to 'skip' a task from inside Mephisto, allowing workers to mark units that they don't want to work on and get them reassigned if there are any other options in the pool. That would still be a significant lift, but I can see it being a direct feature add for many tasks rather than a rough workaround.