Closed gureckis closed 5 years ago
I tried to make this change (at least in a hacky hard-coded way) last night and my changes seemed to have no effect. Upon further investigation, it seems like the blocking might be occurring via a copy of experiment.py on the ad server, rather than the local one that I am changing. Is that an accurate assessment? If so, is there a way to bypass the ad server or otherwise test a change like this?
Reopening because it's still not implemented on the psiTurk ad server.
moving discussion of this over here.
i added a new field to the psiturk cloud for 'allow_repeats'. i don't think it broke any thing but let me know (please test your branch @deargle). it is a property of ads both in the sandbox and regular ad server now. however the sandbox ad server basically ignores this field because it always lets you repeat (helpful for debugging). the live ad server should assume if you don't specify that you want to block repeats.
now the question is what the behavior should be concerning allow repeats. the logic in the section of the code can be kind of complex with so many different conditions that can come up. in addition the status codes are impacted by the hitId/assignementId values as well). what do people want?
quick summary: I think there are four conditions
allow_repeats
enabled. but what should happen?i guess the proposal is that everything stays the same except if allow_repeats
is turned on you can see the ad if your status >= SUBMITTED.
but is that right? I'm thinking that maybe the issue lies in the psiturk client check_worker_status
function. here it looks in your local db for a particular worker and returns information about this person. although the function itself is passed the workerId and assignmentId from psiturk.org the later is currently ignored.
https://github.com/NYUCCL/psiTurk/blob/master/psiturk/experiment.py#L180
since i've never done this the questions is if you do the task multiple times you get a new assignmentId as well. so really there needs to be something more complex in check_worker_status
where if allow_repeats
is false, it looks for the workerId ignoring assignmentId and behaves as usual. In contrast if allow_repeats
is true then you always lookup workerId and assignmentId and so repeats with different assignmentIds just act as usual?
phew.
please test your branch @deargle
Just tested, had great success. ad id if that's helpful.
i guess the proposal is that everything stays the same except if allow_repeats is turned on you can see the ad if your status >= SUBMITTED.
The way @jhamrick implemented it is nice. She only changed 19 lines in experiment.py to get it working. It's not a simple check for if status >= SUBMITTED, because people who quit early aren't allowed back in, even if repeats are allowed.
since i've never done this the questions is if you do the task multiple times you get a new assignmentId as well. so really there needs to be something more complex in check_worker_status where if allow_repeats is false, it looks for the workerId ignoring assignmentId and behaves as usual. In contrast if allow_repeats is true then you always lookup workerId and assignmentId and so repeats with different assignmentIds just act as usual?
Her code does just what you're describing. If I understand you.
I'm sure I missed something important in what you said.
the sandbox ad you just posted did not set allow_repeats to true, was that what you expected?
Yes. Here's another that does set it to true.
perfect, your tests works well.
yes @jhamrick approach is in the right flavor, especially the part that selectively checks against either workerId or workerId+assignmentId. my only concern about her code is that what happens if you first run the task with allow_repeats
false then true? or vice versa? run against the same table you'd have to do a bit more error checking to avoid problems.
my only concern about her code is that what happens if you first run the task with allow_repeats false then true? or vice versa?
Switching back and forth should be fine in theory, yeah? Whether allow_repeats
is True or False, it checks if the number of matches (workerId or workerId+assignmentId) == 0. This should work regardless of the pattern of past settings for allow_repeats
for an experiment.
it's a little confusing to me. if you allow_repeats (so there are multiple copies of the same worker in the db) and then disallow repeats, what status should be returned for the worker when the ad server queries? basically in the disallow repeats case the check_worker_status is if the worker has done the task before. could there be a situation where the worker accepted but did not complete multiple assignments? in this case they should technically be allowed to run in the new assignment even though the number of matches at the worker level is >0. its more like number of matches where the status is above some threshold maybe... these are perhaps unlikely cases but certainly possible.
I think allow_repeats
permits workers to retake an experiment only if
they completed the most recent attempt. That means that check_worker_status
should return the status for the most recent db entry for a given worker.
At least, that's the way I think it should work. I'll review the code later
when not on mobile.
On Thu, Feb 9, 2017, 12:13 PM Todd Gureckis notifications@github.com wrote:
it's a little confusing to me. if you allow_repeats (so there are multiple copies of the same worker in the db) and then disallow repeats, what status should be returned for the worker when the ad server queries? basically in the disallow repeats case the check_worker_status is if the worker has done the task before. could there be a situation where the worker accepted but did not complete multiple assignments? in this case they should technically be allowed to run in the new assignment even though the number of matches is >0. its more like number of matches where the status is above some threshold maybe...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NYUCCL/psiTurk/issues/90#issuecomment-278708080, or mute the thread https://github.com/notifications/unsubscribe-auth/ABHsfeGBIZOHgGAx9e1s5HgmPMS_E7fnks5ra0lGgaJpZM4B0rJi .
Forget everything I said in the previous comment here, there's no checking of the previous submission. Maybe that can be implemented later, but as it stands, it's a fresh start every time. I vote we go ahead with the code as it currently is (including this commit) until moar feature requests are made.
I'll update the docs if you give a 👍
@gureckis docs are ready. Is the ad server ready to go --
wait, hold that, I guess the takeaway from this whole conversation is that all the work is done in check_worker_status, and that the ad server doesn't need to do anything with the allow_repeats param sent to it. Oh well, we can pass it anyways.
I'll merge.
I think that is right. The new ad server field isn't terrible to have though in case we want to alter some of the error messages in the future based on the type of experiment it is.
This is one of the features that making an automated test for is a little tricky i think.
Oh actually that reminds me that in my unfinished branch on this I included something like always_show_instructions
which allows you to configure if the instruction phase should run every time or only the first time the task is completed. the idea being on repeats you might want to skip the instruction phase (if you have it). possible a new feature request but it naturally ties to the allow repeats issue.
currently psiturk defaults to database-level blocking of workers such that a worker who appears in the database can't take the task again. in some cases this is not desired behavior. we should create a config option 'allow_repeat' which toggles this. also, might be good to allow option 'always_show_instructions' because sometimes if a worker has already done a task once they don't need to read the instructions part again.