Support longitudinal studies

jbmartin commented 11 years ago

Our current setup prevents workers from participating in a study more than once.

gureckis commented 10 years ago

This is tricky... My guess is that you'd want to do this somewhat outside of Turk actually (maybe). For example, user accepts HIT which involves doing a task one a day for 5 days. On day 2 do they accept another hit? Or do they just go to some fixed URL and enter a username/password? I'm not sure how this is done typically, but it could be possible to add a password-protected route via the new custom_routes feature that could allow users to "log in". Also a standard hit could just collect email addresses so you could send out email reminders with the appropriate login URL (again, run via custom routes).

jodeleeuw commented 10 years ago

We have had some success with emailing workers to have them do follow-up studies. One way to do this within Turk would be to have custom qualifications assigned to workers who complete each part of the study. So after day 1 they get a qualification that allows them to do day 2, and so on. This would offload the authentication aspect to Turk.

jbmartin commented 10 years ago

I like @jodeleeuw's idea. Here's AMZ's qualification documenation and Boto's.

jbmartin commented 10 years ago

Can custom qualifications be used to prevent workers from repeating an experiment? We currently store this info in a database, but this seems like a much simpler approach.

jodeleeuw commented 10 years ago

I'm not sure if you can prevent a user from doing a HIT based on a qualification. One issue with qualifications is that you can only assign them after a worker has submitted a HIT (I think). I'm not sure when you record that a subject has "done" an experiment. I tend to record this as soon as they start the HIT, so that if they get part way through, quit, and come back they can't do it again.

jbmartin commented 10 years ago

Good point. After they submit a HIT, however, I'm guessing you can prevent them from retaking your experiment since AMZ already allows us to reject workers if they fall below a certain HIT acceptance rate e.g., 95%.

jodeleeuw commented 10 years ago

It looks like you may be able to use the NotEqualTo comparator to prevent users with a qualification from doing a HIT: http://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/ApiReference_QualificationRequirementDataStructureArticle.html

jbmartin commented 10 years ago

My concern would be if tech-savvy workers can change their own qualifications.

jodeleeuw commented 10 years ago

I don't think workers can delete a qualification. I just tried to remove a qualification on my account and couldn't find a way to do it.

jbmartin commented 10 years ago

Instead of completely removing a qualification, would it be possible to change one i.e., from stage=1 to stage=2? Would a savvy worker also be able to change these variables? If we can't change variables, I guess it would be possible to add a new qualification attribute each time a worker advances, but this seems less elegant and I don't think there's anything to prevent the worker from manipulating their own qualifications, assuming they knew which ones to add.

jodeleeuw commented 10 years ago

Only the creator of a qualification can assign its value. I guess it would be possible for a worker to assign an identical qualification to themselves, so this wouldn't be as secure as having a private database. On the other hand, workers assigning themselves qualifications is probably so rare that it wouldn't really matter.

mizzao commented 10 years ago

Hi guys...I've been working on something that has first-class support for both this and #36 over at https://github.com/HarvardEconCS/turkserver-meteor. It's designed from the ground up for multi-user, interactive, real-time experiments, and builds on the data-over-the-wire functionality provided by Meteor to provide a client- and server-side API for managing experiment code. One of the huge benefits of full-stack code is that one can provide things that handle network logic in the package and for users to take advantage of that. I feel it would be a lot harder to build this on top of Python because of the amount of programming knowledge that would be needed to leverage it.

My project isn't documented very well yet, but it's the product of 3 re-designs over several years of my PhD work and I think it's sufficiently well thought out now for any kind of experiment that people would like to do over the next few years. I'd be happy to tell you guys more about it and about some of the experiments we're working on if you're interested.

To handle longitudinal studies, my current design allows unique workers to take a HIT and log back in to the system. If they are allowed (i.e. a login handler decides it's been sufficiently long since they last came), they can be assigned to one of several experiment "worlds" that exist; presumably the same world they were in for a previous assignment. This allows multiple users to interact over time, possibly at a scale of hundreds, for experiments such as markets or other long-running studies.

jbmartin commented 10 years ago

Hi @mizzao, looks like an interesting project! I've used Meteor before and agree that it's a great framework (I find reactive programming natural to think about).

We chose Flask over Meteor, Express, etc. for a few reasons. At the time, it had a larger user base and better documentation, it was lighter, and database options were more flexible (Meteor pushes for MongoDB). However, I agree that it's nice to use Javascript across the front and backend, but we've setup psiTurk such that our users only need to know a little Javascript. The one thing that's really missing from Flask is the ability to easily create real-time, multi-user experiments. Flask does offer websocket extensions, but they're non-trivial to setup for the casual experimenter.

Regarding longitudinal studies, do you find that many people use that feature? We've only got a few requests for it.

mizzao commented 10 years ago

Hi @jbmartin - Meteor has probably grown significantly from where it was since you started this project. It's very stable, the core team is incredibly talented and has a great vision, and there are also tons of users. Meteor is actually not opinionated about most of the things that people would usually care about with frontend development, and the reason they're married to MongoDB right now is because they've implemented a live version of Mongo in pure Javascript that runs in the browser.

I'm not sure about how far one can take web-based experiments without any programming knowledge, but I do think that Meteor is about as simple as it gets for web programming. Moreover, designing multi-user apps in Meteor is just as simple as single-user apps because of the live data synchronization. I've built on top of this a way to partition the data in Meteor so it can be used to create separate experiment instances (https://github.com/mizzao/meteor-partitioner).

The other benefit of a full-stack framework is being able to provide functionality that requires server-client communication in a transparent way. For example, we have the ability to monitor users' idle state and whether their tab is blurred (https://github.com/mizzao/meteor-user-status), client and server-side logging, NTP-style time syncing and accurate timers available on the client based on server time, and live viewing of experiment state and logs from the admin interface. My goal is to have turkserver be a drop-in package for a Meteor app that gives you a live console for your experiment (I'd be happy to give you a demo when I'm in NYC this summer). In addition, Meteor provides free HTTPS hosting and their future product will make deployments easy, which is a barrier for most people who don't know how to run web servers :)

My experiments have been mainly focused on settings that require interaction between users, either in real-time or asynchronously, which I feel is probably different from your (psychology?) audience. However, I think the real barrier to doing long-running, real-time experiments is that they're hard to set up, not because people don't want to do them. For example, simulated trading in a market over a period of a few weeks. I'm hoping that making this functionality easily accessible will stimulate new kinds of research.

jbmartin commented 10 years ago

This discussion is straying from the original longitudinal issue, but I'd definitely be interested in meeting up next time you're in town.

To aid each other's development (and avoid duplicated efforts), I'd be interested in a feature comparison between psiTurk and Turk Server. Most of the front-end features you mention, we offer in the psiTurk.js API. We also offer a bunch of commandline tools for paying and bonusing people automatically, maintaing the server, creating and modifying HITS, monitoring real-time logs, etc., and an ad server for handling https, tracking between-study/between-user data, and demographics. I'd be particularly interested if there's anything that one of our frameworks can do (or be extended to do) that the other one can't. This kind of analysis would be helpful in guiding the direction of our (and hopefully your) development.

mizzao commented 10 years ago

Great, let's table the discussion here and continue it in person. I'm sure we are both somewhat committed to our own ways of doing things and it would be great to have a productive discussion that can expand both of our horizons a bit.

davclark commented 9 years ago

I just saw this via your talk at Berkeley! I've got code to do longitudinal studies here:

https://github.com/davclark/mturk_admin

In particular, I've got javascript code in there that will display different HTML in the HIT depending on worker ID (e.g., to filter out participants from other HITs).

I'd be happy to work on integrating this if there's interest.

mizzao commented 9 years ago

I'm currently at the 2014 Conference on Digital Experimentation and there is a lot of interesting work here on the methodology, theory, and practice of experiments. It would be ideal over the next few years to integrate the different specialties of tribally-affiliated software, or at the least do some interdisciplinary learning.

davclark commented 9 years ago

@mizzao absolutely!

NYUCCL / psiTurk

Support longitudinal studies #37