Building an alternative to AMT?

gureckis commented 9 years ago

There is a new interest in alternative recruitment methods besides Amazon Mechanical Turk in response to the changing prices (https://requester.mturk.com/pricing). This has been a possible area of development for some time (e.g., Issue #95).

I wanted to open this issue to allow some discussion about this.

To kick things off here is one idea:

Extend psiturk.org to provide a dynamic listing of currently available tasks. Workers visit this website and can pick which psiturk application to participate in. For non-psiturk users we could create a webform that allows you to post a task via a web link or something simple (e.g., Qualtrics).

More specifically in psiturk:

You launch your web application and it automatically connects to the centralized “tracker” server on psiturk.org to register the availability of the job.
Workers visit the task listing on psiturk.org, find what the want to work on, and then are passed off to the individual web application located anywhere on the web.
The requester just pays the person at the end directly using Square Cash (or whatever). This is simple and can be triggered via email (no complex financial API needed, accounting demands placed on the requester rather than on service).

The tracker server could be extended in various ways to allow workers and requester to review each other, etc...

Please discuss! Of course, it doesn't have to be limited to directions for psiTurk, just thinking how to build on what this community has already developed. If you already are working on something like this also helpful if you share links to it and basic ideas for discussion.

twiecki commented 9 years ago

Wow, they are going to charge 40% commission?!

alexanderrich commented 9 years ago

the github page for the folks at Stanford working on a crowdsourcing platform is here: https://github.com/crowdresearch/crowdsource-platform

worth taking a look at for ideas and potential collaboration!

jodeleeuw commented 9 years ago

I think it would wise to design the site to be agnostic about the platform for running experiments. The integration scenario @gureckis described between psiturk and the server would be great, and I think it should be supported via an API that could be tapped into with other methods as well. This encourages a broader user base, which is probably the single most important factor for building a successful crowdsourcing site.

davidshumway commented 9 years ago

Mturkclone.com domain name is available. BTW AMT requester Operations are: http://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/ApiReference_OperationsArticle.html. RESTful implementation of the API: http://their.github.io/amt-js-operations/example.html.

gurekis: "1. You launch your web application and it automatically connects to the centralized “tracker” server on psiturk.org to register the availability of the job."

Hosting small HTML "iframe" content for tasks along with providing a list of "templates" is an AMT mainstay. Also the ability to upload CSV data and integrate it in a task with markdown.

jodeleeuw commented 9 years ago

I don't see the need to mimic some of the more "expensive" features of mturk, like providing templates or CSV data integration, because (1) the target audience is researchers who are hosting experiments on their own servers and (2) psiturk experiment exchange and other similar tools already provide a mechanism for templates that can be modified. In terms of features, I think it would be wise to focus on the matchmaking side of things as opposed to the task creation side. Mturk is pretty lousy at the former, but relatively good at the latter. Incorporating basic matchmaking features like two-way reviews, the ability to see information about worker and requester behavior, and a more sophisticated system for tracking who has done what experiment would be real advantages over mturk.

davidshumway commented 9 years ago

@jodeleeuw Just a guess but it feels like a lot of workers participate in academic surveys on AMT because AMT has additional income sources available that brought workers to the site in the first place. We have been hesitant to join other crowd labor sites that are less broad than AMT. Offering other types of work (an open marketplace) in addition to acadamia would greatly enhance the number of workers willing to join.

jodeleeuw commented 9 years ago

@their Agreed. There's a trade-off between tailoring the site to an academic audience and attracting as many users as possible.

I think the trouble with mturk for a lot of researcher-types is that we don't use most of the features that they provide--templates/GUI for task building is a significant one--and the feature that we really want, frictionless integration with externally hosted experiments, is pretty lousy. A viable AMT alternative might focus exclusively on providing an outstanding platform for externally hosted tasks. Researchers will be a big user base for something like this, but it wouldn't be limited to academics. One major advantage of this approach is that it keeps costs down. You don't need to store task data or serve task content.

jodeleeuw commented 9 years ago

Feature Request List

Experiment management

Can indicate experiment completion and experiment dropout with externally hosted tasks. I should be able to make an AJAX-y type call to a URL that marks completion, and ideally I should be able to do the same for a subject who decides to leave the experiment (the feasibility probably depends on how they choose to leave).
Records indicate who started the task, finished the task, dropped out of the task. Timestamps on all of these actions. Can automatically see summary statistics like average completion time, dropout rate, etc... (and potentially make these public for potential participants?)

Matchmaking

Incorporate turkopticon-style reviews for task providers.
Immediate review of tasks right after completion. e.g. on the completion screen, there is an option to provide a quick rating of the task. These ratings show up live.
Basic demographic information for worker accounts, that requesters can optionally download for their experiment.
Ability to filter participant eligibility by demographics.
Participants can filter tasks by length, pay, requester rating, etc...
Ability to filter out participants who have completed particular previous experiments.
More ambitiously: a set of tags for common experimental tasks that experimenters can privately mark their experiment with. Participants who complete the study are tagged, and subsequent experimenters can filter out people who have some level of experience with those tasks.

Wishful thinking

In the list of experiments run on an account, have an option for contacting everyone who completed a particular experiment (via a participant-selected method, such as a within-site notification when they log in, or an e-mail) so that I can send them a copy of the published work. It could also be more like the 'updates' section of a kickstarter page. The goal would be to make participants feel more connected to the research that is happening, and hopefully more invested in the tasks they are doing.

kcarnold commented 9 years ago

Excellent thoughts so far! There's probably a lot of synergy with the Stanford project -- here's a better link for their objectives and overall project.

Also, someone from the Turk community recommended I check out https://prolificacademic.co.uk.

spamgirl commented 9 years ago

Be very clear, Amazon aggressively protects its trademarks, such as mTurk.

ghost commented 9 years ago

I'd love to see preregistration and open data tied into an online data collection platform. To do a study, you MUST state what effect you're expecting to find, and after completion (plus perhaps a brief embargo), the raw data will become available to all researchers. This would address many of the issues in replicability in research.

ghonk commented 9 years ago

Our lab would certainly check this out/contribute were it to become a reality.

Overall, we second josh's suggestions. One concern we have (also addressed somewhat in Josh's posts) is the idea that it would be a psych-only venue. There is something nice about the fact that mturk has so many true "human intelligence tasks" (transcription, website testing, etc.) so that workers are not doing psych (behavioral) research experiments for every job. The suggestion of opening it up to multiple venues seems like a good start to addressing this.

just echoing what's already been mentioned: recruitment, reviewing and unique identification seem to be key pieces. Amazon achieves some of this by linking people to a bank account, what's been proposed here puts this responsibility in the hands of the researcher - is that the intention or would there be some added method of unique verification? There's some distance between having user accounts and verifying uniqueness that is not yet addressed here.

regardless of these concerns, recent work characterizing the size of the mturk pool and naivety of the workers has us aiming to use online data collection strictly for norming of materials and piloting (combined with in-person participants) so we're happy to try out something new and even be a part of the process if we are needed.

And to Todd et al., thanks for the work you all have done so far, psiturk is truly a great resource no matter how this amazon situation is resolved.

davidshumway commented 9 years ago

@ghonk

"just echoing what's already been mentioned: recruitment, reviewing and unique identification seem to be key pieces. Amazon achieves some of this by linking people to a bank account, what's been proposed here puts this responsibility in the hands of the researcher - is that the intention or would there be some added method of unique verification? There's some distance between having user accounts and verifying uniqueness that is not yet addressed here."

In addition to a bank account they also require both a Social Security Number and phone number.

jodeleeuw commented 9 years ago

I think the SSN is primarily for tax liability. This is an aspect of mturk that is really valuable to researchers: accounting is really simple. An alternative that changed the model to a peer-to-peer transaction would mean that there is potentially accounting that needs to happen within a lab.

Phelimb commented 9 years ago

Just want to preface this by saying I'm the co-founder and developer behind https://prolificacademic.co.uk - thanks for the mention @kcarnold. Although I'm not intending to make a sales pitch (I understand the will to have a commercial-free platform) I will mention that we have the majority of features mentioned by @jodeleeuw with perhaps the exception of "set of tags for common experimental tasks" which is certainly on the to-do list for us.

Happy to contribute where I and Prolific can. I just wanted to echo what's been mentioned above that user verification is something that definitely needs to be considered early if you'll be making payments. We were surprised by how quickly we became a target for fraudulent behaviour when we started Prolific. It has definitely been one of the more challenging tasks we've had to deal with and something that Mturk has probably dealt with behind the scenes for a long while.

spamgirl commented 9 years ago

What about international users who lack an SSN? I think it's important to include both "workers" and researchers globally as AMT's blocking of both has been a huge bone of contention in the past.

At 03:56 PM 06/24/2015, you wrote:

https://github.com/ghonk@ghonk

"just echoing what's already been mentioned: recruitment, reviewing and unique identification seem to be key pieces. Amazon achieves some of this by linking people to a bank account, what's been proposed here puts this responsibility in the hands of the researcher - is that the intention or would there be some added method of unique verification? There's some distance between having user accounts and verifying uniqueness that is not yet addressed here."

In addition to a bank account they also require both a Social Security Number and phone number.

davidshumway commented 9 years ago

@spamgirl

What about international users who lack an SSN? I think it's important to include both "workers" and researchers globally as AMT's blocking of both has been a huge bone of contention in the past.

Definitely! There needs to be a way to include and verify international workers as well. It could be that how the worker is verified is viewable by a "requester" and the requester is able to qualify workers based on that (in addition to country of origin). E.g. for a specific task a requester can allow workers in the USA verified by SSN but disallow workers in USA who are not verified by SSN, or vice versa. Or further if desired any worker in the USA regardless of verification.

EconometricsBySimulation commented 9 years ago

Since some kind of IRB consent page is required of so many projects, I think incorporating it directly into the system as something a user must click through in order to do a hit would be ideal. Incorporating TO style reviews sounds great. I think the idea of having many hit statistics available to workers is great.

I would appreciate a mechanism by which workers can revise submissions. I hate rejecting work when a worker only does half the task or does it at a low quality. I never understand why but giving workers the opportunities to revise work might make things much less tense.

davidshumway commented 9 years ago

@EconometricsBySimulation

I would appreciate a mechanism by which workers can revise submissions. I hate rejecting work when a worker only does half the task or does it at a low quality. I never understand why but giving workers the opportunities to revise work might make things much less tense.

Access to the content of submitted tasks is significant by itself. Allowing the worker to revise tasks is icing on the cake. This is definitely a major feature left out at AMT.

mbernst commented 9 years ago

Excellent discussion. (Hi, I'm Michael Bernstein from Stanford CS, where we have a crowd of researchers around the world building a better, open crowdsourcing platform.)

Many flowers bloom. So I say: go for it! If you want, we'd love to collaborate if any of you want to join forces.

gureckis commented 9 years ago

@mbernst I'm not sure if reinventing wheel is a good path or not (probably not!). I guess the question for me is what the best path would be for psychologists coming from the psiturk tradition (or related approaches) might interface and help what you've done already. For example if there is an API for your system maybe we could just make psiturk interface with that as an option alongside AMT. That might bring a community of users with interesting tasks right away to your system!

gureckis commented 9 years ago

A couple of quick thoughts... There are a lot of really interesting feature ideas here but for now my mind is kind of just thinking in broad terms about the feasibility a viable, entirely distributed, open architecture.

But first,

I think encouraging some of the tech-savvy psychology/marketing/economics folks to combine forced with things already underway such as @mbernst is a good idea. There is some heavy lifting here and important problems to solve for the system to work. Launching a half baked system is not going to work that well. Also it looks like the Stanford group has done a lot of research and thought on the marketplace design issue already.

That said...

I think the idea of a "tracker" server that just connects workers to work websites is pretty basic. Many things can be grafted on top of this including worker/requester ratings, etc... The downside is that the centralized server becomes really important (cost, uptime is critical, DoS attacks from disaffected workers can cripple the community). It'd be nice if there wasn't just a single tracker point of contact but things were even more distributed.
Most importantly are worker verification controls. Airbnb and the like have pretty advanced methods for this including linking social media, verifying emails, captchas, etc... It doesn't have to include SSN or bank accounts. I suppose some of those types of techniques exist in the open source community as well and could be leveraged. Another idea is that the tracker system itself could offer as a "task" one of helping to remove bots. Maybe workers have to do a task like draw a few pictures of items belonging to common but non-obvious categories. Other workers view these and rate them in a type of "visual turing test." If a worker's pictures are consistently rated by the crowd as non-human the worker is given lower priority in the system. People might have to provide new pictures once every few tasks completed to ensure that a verified human account isn't sold off to a bot! ;) Basically build into the crowdsourcing system a set of human-computation type tasks of removing bots that we ask members of the community to perform.
Using the tracker approach, there is always the technical hurdle that people using a tracker will have to post back something at the end of a task to register that the worker completed it. This last step is not easy to do in external survey hosting sites like Google Forms, etc.. since they don't have post-task completion hooks as far as I know. You can do this kind of thing in psiTurk easily (or really any Javascript) but as we have mentioned here you want a broad user base. Maybe one solution is that the post-back doesn't have to be online via an API but a requester can log into the tracker later and see a list of workers who agreed to begin the task and can verify they completed all or part of it manually (I guess that is similar to how some people use the current AMT system).

Other random feature ideas for research:

some attempts to build crowdsourcing research sites allow "opt-in" for studies using deception during sign-up. this helps since AMT has lots of deceptive cover stories due to certain areas of psychology which mess with other people's attempts to faithfully describe the subjects what we want them to do
a single blanket irb for the system would be amazing. not sure if that is practical across multiple universities.
better means for communication between workers and requesters (message passing). currently people rely a lot on email which can be difficult and doesn't scale well, different requesters are better or worse at email responses. If that was moved into a system it could be an implicit measure of requester quality (how quickly and well people respond to messages from workers).

jodeleeuw commented 9 years ago

Using the tracker approach, there is always the technical hurdle that people using a tracker will have to post back something at the end of a task to register that the worker completed it. This last step is not easy to do in external survey hosting sites like Google Forms, etc.. since they don't have post-task completion hooks as far as I know.

You could always provide two different ways to do this. One would be an AJAX call to a URL, the other would be a URL that participants manually click, sending them back to the tracker server page, that registers task completion (perhaps requiring them to authenticate their identity). A manual option on top of this would give people sufficient flexibility to set up many different platforms.

Portll commented 9 years ago

Someone mentioned the IRB/consent issue. Here down under, we're finding each PhD/study/etc requires it's own consent to be signed off by the relevant board anyway, so duplicating consent across projects isn't possible. Having said that, some kind of Worker/Project/Lab/Deparment/University/Country hierarchcy might make sense to enable this to clone across departments that do find this useful.

One of the massive benefits of mTurk is the breadth of participants, something that a psychology-only platform would really struggle with. After all, how many times can one person do a DASS while not skewing data? I'd be keen for PsiTurk to look at collaborating with or merging into larger open source projects for exactly this reason - but as someone else pointed out, getting people to do a psych task in the middle of a range of non-psych tasks is the only way we can get clean data. As for payment, being in Australia we've increasingly been frozen out of mTurk, so another platform that welcomes us would be a wonderful change. I still don't understand why amazon can't verify users with a .edu* address - it definitely seems to be a willpower issue.

Sorry for typos.

On Fri, Jun 26, 2015 at 4:00 AM, Josh de Leeuw notifications@github.com wrote:

Using the tracker approach, there is always the technical hurdle that people using a tracker will have to post back something at the end of a task to register that the worker completed it. This last step is not easy to do in external survey hosting sites like Google Forms, etc.. since they don't have post-task completion hooks as far as I know.

You could always provide two different ways to do this. One would be an AJAX call to a URL, the other would be a URL that participants manually click, sending them back to the tracker server page, that registers task completion (perhaps requiring them to authenticate their identity). A manual option on top of this would give people sufficient flexibility to set up many different platforms.

— Reply to this email directly or view it on GitHub https://github.com/NYUCCL/psiTurk/issues/186#issuecomment-115345256.

gureckis commented 9 years ago

So this discussion has been lingering in my head but I'm not quite sure what to do with it. Instead of proposing market features (like IRB issues), I wanted to maybe back up and think about the core technology stack and really fundamental problems regarding online work that noone seems to be working on. Perhaps the open-source community could focus somewhat on these smaller pieces which can then be glued together.

How do you find work to be done? - I’ll call this the “matching” problem (matching workers to requesters). I believe that an interesting solution to this would involve distributed listings via a tracker-type process. Rather than have a single server, which is a possible point of failure or attack, it would be nice if the system which advertised the availability of work to potential crowdworkers could be distributed. Basically, we might want to make it so that many different people can start up little "markets" which are public or private (such as internal to a university human subjects pool). Even better is if public trackers knew how to talk to each other so that they could pass work requests between each other so that there isn't certain trackers that become the "defacto” standard for all listings in a winner-take-all fashion. As discussed above the complexity of a tracker can be scaled pretty widely from more or less a plain text listing of URLs where work can be done all the way to a complex and pretty website with lots of features.
When someone completes a task online how can you verify that? - In real life this is sort of obvious (you can hire someone to cut your lawn and if it is cut, you know the work was done and you owe the person money). For now, envision a crowdsourcing market that only applies to online work. In this case, say we want to be able to know if someone actually completes an entire task running in Javascript on a browser. In addition, we would want to know if someone skips sections of the task or whatever by modifying the code in the browser. This is a complex problem to solve, but it sort of interesting. Maybe there is a third party involved (e.g., the tracker). The worker could create a hashcode which is based on the data file and the entire code listing and submit that to the tracker. The requester independently could do that same using the task code and the received worker data. If they match then the tracker would code things as “eligible” and payment can be arranged. An alternative method could be that the worker repeatedly requests randomized codes from the requester during the execution of the program. At the end of the task, the worker submits the sequence of received codes back to the worker and if they match then we can believe that the task was completed in sequence. The codes could have indexed unique traces of the current state of memory or variables in the browser so that if you skip over something you’ll have invalid codes. The point is to try to make a system where one can verify that online work has been actually completed. Maybe this is insolvable and needs really a peer-review, trust-based system (#3) but worth thinking about.
How to manage reputation of workers and requesters in a distributed, hack-proof fashion? - I’m not sure about this exactly but the bitcoin protocol seems to have some hints in this direction. Although bitcoin measures economic value, it is still a distributed system which is hard to hack which tracks the relative value of many different anonymous actors. It’s a distributed ledger and a distributed ledger is exactly the kind of decentralized system one needs to manage reputation.
How to manage online identity? - This may again be a place where crypographic ideas are useful. For example, digital signatures are pretty common place in the crypto community. Maybe the core system should be based on this type of trust mechanism. For example, according to #2 people could encrypt the code for the experiment, some type of trace of the execution of the code, and the data file using a private-public key mechanism which is shared with the requester. This would help verify that a particular user, when they request funds, actually did the task and the associated data (see https://en.wikipedia.org/wiki/Digital_signature).
Payment - I know that researchers are not looking for complex scheme and want a broad audience but bitcoin seems like a pretty good option for payment transfers as it avoids some of the complexity of paying people via a banking system. Importantly it would let the “market” itself stay out of the cash transfer part which incurs considerable legal overhead. The system just lets people know work can be done and where to do it, but individuals are responsible for paying. I know people are hesitant to set up bitcoin and the value has fluctuated in unreliable ways in recent years but worth talking through. In the future the tools might make this a more feasible option, but at the moment it is the leading method for anonymous cash transfers anywhere in the world which is a feature a market like this should have.

Anyway, I just wanted to throw these ideas out there. I think people are thinking big about how to “fix” AMT, but I’m hoping some discussion can also be about a "next-generation" crowd-sourcing system that has features which far outstrip the others in terms of distributed computing, self-managing, etc… I may be idealistic or think distributed computing+cryptography will solve every one of the world's problems but fun to think about it and discuss either way!

NYUCCL / psiTurk