Dallinger / Dallinger

Laboratory automation for the behavioral and social sciences; human culture on a chip.
https://dallinger.readthedocs.io/
MIT License
117 stars 36 forks source link

Prolific recruiter does not listen for returned or abandoned notifications #5725

Open thomasmorgan opened 10 months ago

thomasmorgan commented 10 months ago

The prolific recruiter listens for submitted notifications, but not returned or abandoned notifications. This is sort of a missing feature as opposed to a bug, but the functionality is so critical I've labeled it as a bug anyway.

Expected Behavior

The prolific recruiter should listen for returned and abandoned notifications and process them accordingly. This functionality is already present for the MTurk recruiter.

Current Behavior

If a participant returns a hit, Dallinger cannot fail their nodes etc. because it is not processing any notification that the participant returned the hit. However, prolific still recruits a replacement. This can cause participants to arrive at inappropriate times, for instance where there is no space in the networks for them. in such cases, participants often return the hit (resulting in another participant being recruited by prolific) or submit an incomplete hit (resulting in Dallinger auto_recruiting a replacement due to the incomplete submission being picked up by data_check). In both cases this can lead to a recruitment spiral which wastes money.

Steps to Reproduce (for bugs)

Run any experiment on prolific. Wait until a participant returns the hit part way through an experiment. Note that the participants nodes, etc. are not failed.

You can also just look at the code for the prolific recruiter - there is code for submitted notifications, but no others.

Context

We were running an experiment in which networks consisted of pairs of nodes, each associated with a different participant. When one participant returns, we need to replace them. Prolific does automatically send in a replacement. However, without a notification to indicate which participant needs to be replaced it is not possible to add the new participant to the network and the experiment stalls.

Your Environment

jessesnyder commented 7 months ago

https://docs.prolific.com/docs/api-docs/public/#tag/Submissions/Submissions-guide

jessesnyder commented 7 months ago

@thomasmorgan @jacobyn Prolific does not send notifications, so we will need to poll, based on a clock task. We already use the clock to tell recruiters about overdue participants based on config.get("duration"), but the ProlificRecruiter doesn't do anything in response currently. This should be a reasonably straightforward enhancement. However, it sounds like you also need to know about returned participants more or less as they happen (and definitely prior to exceeding the experiment duration). Is that correct?

jessesnyder commented 7 months ago

Also, the MTurkRecruiter does a lot of fancy stuff when experiment duration is exceeded (emails to the researcher, attempting to close the HIT, etc.) Do we want to emulate that for Prolific?

pmcharrison commented 7 months ago

@polvanrijn has some thoughts about this too. I think that a clock process would work well, though I think the actual queries should happen in a worker process to prevent blocking (unless you can use some async logic within the clock process itself?)

polvanrijn commented 7 months ago

Yes, we were thinking it could be handy to periodically poll the recruiter. From polling the recruiter, we will get quite some stats (e.g., total cost of experiment) but also the 'true' recruiter status of participants. This will also be very handy to determine how many participants returned or to find out how many 'alive' participants there are in the experiment. Especially the last point is often hard to find out and can be quite relevant depending on the experiment (e.g. for synchronous experiments). Also, this allows us to see any potential discrepancies in paid bonus + base pay to our internally computed numbers.

pmcharrison commented 7 months ago

Yep, I think some general process like @polvanrijn outlines would be very good for making sure all of our numbers stay up-to-date and consistent. We have recently talked about refactoring recruiter classes into a common interface; it might be wise to do that refactoring prior to the present issue, that way we can get this functionality into all the recruiters for the price of one.

thomasmorgan commented 7 months ago

@jessesnyder Very frustrating to hear that Prolific does not send out returned or abandoned notifications. A clock task detecting overdue (i.e. abandoned) participants sounds good, but yes, you are right that we also need to somehow detect returned participants. In addition you are right that this needs to happen quickly because prolific sends replacements very fast (or at least it tries to and often succeeds).

As for the fancy stuff, the code has changed a lot since I last worked on it (I believe I wrote the very first version of this monitoring stuff about 10 years ago, including some bizarre email templates), but I'm not sure whether we want the fancy stuff for prolific. The issue is that for MTurk, missing notifications are very rare, and so it's not a major issue that if dallinger detects that one has failed to come through, it just shuts everything down and messages the researcher to come give everything a look. However, with prolific no returned or abandoned notifications will ever come through and so whatever solution is developed it needs to be fully automated otherwise the experiment will be shut down every 5 minutes. If such an automated solution can be developed there is no need to email the researcher.

Right now, the solution is to entirely disable auto_recruit and just have the researcher monitor the experiment, fix things as they arise and then recruit new participants manually as needed. This is a major hassle though, and massively limits the scale of experiments. Any automatic solution is going to need to handle all of this.

@polvanrijn By polling the recruiter do you mean having dallinger poll prolific/Mturk for accurate participant statuses, or do you mean dallinger polling its recruiter class? If its the former that could be a viable solution to this issue as we can manually identify returned participants that way.

Rather than tying this solely to a clock process (creating a race between the dallinger process and prolific that dallinger might lose), it may be more robust to confirm all existing participant statuses whenever a new participant arrives. That way all returned participants will be detected before the new participant has a node created for them.

If getting this information from prolific is not possible we are in a really tricky position. It might be worth asking prolific to create these notifications, although any implementation would be a long way off.

pmcharrison commented 7 months ago

Thanks @thomasmorgan!

I agree with not replicating the MTurk 'fancy stuff' (emails to the researcher, attempting to close the HIT, etc.) with Prolific, sorry for missing that question before.

Rather than tying this solely to a clock process (creating a race between the dallinger process and prolific that dallinger might lose), it may be more robust to confirm all existing participant statuses whenever a new participant arrives. That way all returned participants will be detected before the new participant has a node created for them.

If we put this in the clock process we would set a background task that runs every e.g. 30 seconds and identifies any returned participants. We could then make this process call recruit() if required. Then I think there would be no race condition? I think this would be more efficient than waiting for the next participant to arrive before checking.

jessesnyder commented 7 months ago

It looks like they're working on an event subscription model (I don't remember seeing this before): https://docs.prolific.com/docs/api-docs/public/#tag/Hooks

However, note the immediate disclaimer: "This is an experimental feature that may be subject to change in the future."

It looks like the support person I was working with (Elaine Poon) when developing the ProlificRecruiter isn't around anymore. Do we have a new technical support contact we could ask about the actual maturity/stability of this API?

pmcharrison commented 7 months ago

I'm not sure about the Prolific support contact, but maybe @jacobyn knows?

pmcharrison commented 7 months ago

There is something reassuring about the simple polling approach though, in that you know you have a fully complete record; otherwise a missed notification can mess things up.

thomasmorgan commented 7 months ago

@pmcharrison Thanks! A 30s loop sounds good, but in my experience recruitment services can sometimes procure replacement participants in <5 seconds. I am not sure how they do this so quickly, but it does mean that unless we want the clock process to loop incessantly it probably isn't going to be an optimal solution. This is the race: after a participant returns, which happens first? Does dallinger's clock cycle and spot the returned participant, or does prolific recruit their replacement? If prolific wins dallinger gets super messed up.

Moving calls to recruit() into the clock process won't work either, because it is the recruitment service itself and not dallinger that is recruiting these replacements - and we literally cannot stop them IIRC.

This is why I think the only solution might be to verify all participant statuses whenever a new participant arrives, and clear out returned/abandoned participants before making the new participant their node. However, this is only possible if my reading of Pol's comment is accurate: i.e., we can request an up-to-date list of participant statuses from prolific at any time. If this is not possible then I really think there is no ideal solution.

@jessesnyder This looks promising!

pmcharrison commented 7 months ago

Hi @thomasmorgan, I reread your use case and now I understand why you need this to happen so quickly. In this case, absolutely, I see the logic for checking for returns on the arrival of a new participant.

RE participant statuses via the Prolific API, yes, this is accessible via the Submissions route: https://docs.prolific.com/docs/api-docs/public/#tag/Submissions

polvanrijn commented 7 months ago

@thomasmorgan, I mean to periodically send an API request to the recruiter (Prolific or mturk). I just implemented this for another recruitment service (called Lucid) for Psynet. If you like I can explain how I did it here. But maybe it's not relevant for your use-case.

thomasmorgan commented 7 months ago

@pmcharrison @polvanrijn OK, this is great, thanks. Yes, I think this is the solution then. My only lingering concern is that checking all participant statuses might be slow for large experiments, but there's usually a good while between participant arrival and node creation, so hopefully its not too heavy.

@jessesnyder Does this sound like a viable solution to you?

This is giving me flashbacks to trying to solve the missing notifications issue for MTurk! Glad we have better solutions available than what I was able to come up with back then.

pmcharrison commented 7 months ago

I think fortunately you can list all participant statuses within 1 API call, so it should be scalable: https://docs.prolific.com/docs/api-docs/public/#tag/Submissions/operation/GetSubmissions

polvanrijn commented 7 months ago

Yes, I can confirm this is available for all recruiters I have worked with (Prolific, mturk and Lucid).

pmcharrison commented 7 months ago

A further advantage of developing functionality via this API call is that it is very easy to test. In contrast, it's hard to develop a notification listener without real participants, especially in Prolific which doesn't have a sandbox like MTurk

jessesnyder commented 7 months ago

Some questions

  1. I'm a little frightened by the prospect of making an API call every time a participant joins. This call will need to block, since we're using the result to determine whether to admit the prospective participant, correct? (I'm remembering Griduniverse experiments where very large numbers of people were attempting to join more-or-less simultaneously.) If none of you are worried about this then maybe it's not a real issue.
  2. It seems like we'll still need some combination of a clock-triggered task and/or an event subscription, since at some point there will no longer be new participants joining to trigger a check, and we will want to continue to check for returned or abandoned participants, right?
thomasmorgan commented 7 months ago

@jessesnyder I'm also a little worried about 1.

One solution to this is to only check participant statuses when there is no network for a participant to go into. For instance, in the experiment that triggered the creation of this issue, all networks have just two nodes. However, prolific was sending new participants even when all networks were full, because participants were returning. This apparent lack of space could be a trigger to check all participants' statuses, whereas for most participants (where there is space in the networks) they can just get added without a check.

However, this does assume that it is safe to add participants to not-full networks, which might not be true. Consider a chain network where the next participant should not be added until the prior participant has finished. In this case, even if the network is not full, it is not necessarily safe to add incoming participants as we can't be sure that the prior participant has finished (indeed, they may have returned).

This problem can be solved if the experiment has a sufficiently detailed get_network_for_participant() function. In short, it is not enough to know whether networks are full or not, we also need to know if they are ready to accept new participants. If this function says there are no networks, the experiment then verifies all participants and checks again for networks. If there are still no spaces, the participant gets canned.

So:

  1. Participant arrives
  2. Dallinger finds them a network
  3. If no network, validate all participant statuses
  4. Try to find them a network again
  5. If still no network, bin them

This should greatly reduce the load, but if the experiment involves an overrecruitment strategy it might still get very heavy.

A different option is for dallinger to check how many participants it has asked for, and only verify participant statuses if addition participants arrive. For instance, suppose Dallinger recruits 10 participants. As each arrives, dallinger keeps track of how many have appeared, and while their total number is <= 10 it doesn't do anything special. However, suppose participant 11 arrives. Dallinger knows it did not ask for this participant, so another participant must have returned or abandoned (this was the cue for the original AWS issue) at which point it verifies all participant statuses, finds the participant to be failed, fails them, and inserts participant 11 correctly.

This should be extremely light, but it relies on dallinger being able to keep tabs on how many participants it has asked for.

  1. I don't think so. All recruitment services that Dallinger works with automatically send replacements for returned or abandoned participants, right? In which case, a replacement participant will always arrive. However, there is probably no harm in having this automatically check every 30 seconds as many times it will win the race with the recruitment service and so pre-empt the problem. Moreover, if we take the experiment down, replacements won't be coming, and so in this case the clock probably is necessary (though if the experiment has been taken down we probably wouldn't do much with this information).
pmcharrison commented 7 months ago

These solutions sounds quite complicated... Regarding this from @jessesnyder :

I'm a little frightened by the prospect of making an API call every time a participant joins. This call will need to block, since we're using the result to determine whether to admit the prospective participant, correct?

No, I don't think we are allowed to do that. Once a participant arrives from Prolific we have to let them take the experiment, we're not allowed to turn them away at that point.

I thought that instead @thomasmorgan would just take the opportunity to run some logic on the networks to make sure that the returned participant's nodes are failed, making room for the new participant. But that can be done asynchronously, no? As long as the experiment keeps the participant busy with something else for a few seconds before they reach the network code?

jessesnyder commented 7 months ago

If we consistently update participant statuses as we add each participant (even if this is done in an async worker, so effectively after we add them) the status in the DB will be at or near 100% accurate each time a new participant joins. Is this a close-enough reflection of Prolific? (Agreeing with @pmcharrison here)

thomasmorgan commented 7 months ago

@pmcharrison @jessesnyder Yes, you are both correct. Peter is right that we are doing this validation not to accept or reject participants, but rather to figure out where to place participants who arrive unexpectedly. I guess it is possible that for some experiments there is no where to put them, but then it is just a case of over-recruitment and so they are accepted regardless, jsut sent straight to debrief.

Jesse is correct that this will keep dallinger and prolific pretty much in sync, and given that this specific issue only arises when participants arrive, making that the trigger of syncing should make everything run smoothly. Once we have the sycning function in place, we might want to run it every 30 seconds anyway, just to be safe, but it shouldn't be important unless there are other issues we need to account for.

pmcharrison commented 7 months ago

Agreed on all counts! Thanks both.

jessesnyder commented 7 months ago

While doing some initial work on this I was reminded of the scheduled_task decorator we added about 3 years ago, which seems like it would do a big chunk of what I was working on (register an experiment method to be called at some interval from the clock). Has anyone experimented with this feature to address this participant status issue (or for other purposes)?

I'm thinking through the options for who calls whom in the scheduled clock task scenario, and was considering a model where called a very generic method on the experiment ("ping()" or similar) via an async worker, and the experiment could do anything it wanted to here, but then I realized you can already do that via scheduled_task. (sketch)

The thing we don't already have is a way to get all the current status info from recruiters, but I think we've basically figured out how to do that.

The other options is to keep the experiment out of it a bit more, and just fire the standard events for each participant at the end. (sketch)

pmcharrison commented 7 months ago

I have been using the scheduled_task decorator a lot, it works great! Not for this purpose though. I've only used it for relatively fast processes and I'm not sure about doing API calls there, because a slow API response could disrupt other scheduled tasks (since I think is just a single process)? But you can use the scheduled task to queue worker processes.

I do like the idea of having the scheduled task running in addition to any participant triggers, it seems a good way of offering reassurance that the current status is up to date.

Unfortunately the sketches both gave 404s for me.

jessesnyder commented 7 months ago

Thanks, @pmcharrison. Yes, you're right about the single process blocking unless you pass off to a worker. Sorry about the links! These will be more reliable:

Version 1: Experiment-centric

Sketch

"New-style" Dallinger: tell the experiment that 30 seconds have passed, and leave it up to the experiment to do checks and updated participants. Default implementation of a new experiment method would ask recruiter for the current status of participants that we think are "working", and would then update the status of any participants with the wrong status.

Version 2: Recruiter-centric

Sketch

"Old-style" Dallinger: ping the recruiter[s] every 30 seconds, and have them look for status mismatches on their own participants. Recruiter[s] would enqueue the standard events/commands ("AssignmentReturned", etc.) to be executed by a worker process for any discrepancies discovered.

jacobyn commented 6 months ago

I think that the best solution is to implement the polling option. Namely that we have a route that update the information via an API call to prolific and we set a background process to update this information (say every 30 seconds). We can additionally call this route every time a participant submitting to make sure it is also updated then. I think this solution is easy to implement and address all of the issues. If the experimented needs this information to be updated in another place in the experiment they can call this route as well at any point they desire.

jessesnyder commented 6 months ago

Is it OK to have recruiters initiate the polling (v2 above), or would you prefer to have the experiment instance get the time-based notifications and be responsible for calling the recruiter to find discrepancies, and then dealing with the results (v1 above)?

pmcharrison commented 6 months ago

I think best to put this logic in the Recruiter, yes.

pmcharrison commented 6 months ago

(so v2)

thomasmorgan commented 6 months ago

I agree that we should put this stuff in the recruiter, especially as details are going to be recruiter specific. However, when a participant arrives, won't it be the experiment that detects this and so requests that participant statuses are updated? If so, this means not all the code can be in the recruiter. Although maybe the recruiter does some participant onboarding too?

pmcharrison commented 6 months ago

Yes, we can have the Experiment call ping_recruiters during participant onboarding. We do need to be a bit careful about race conditions (if the Experiment's call overlaps with the background task) but this can be done with careful SQLAlchemy I guess

jessesnyder commented 6 months ago

Agreed that with v2, the pathway to updating existing participants when a new one joins is through ping_recruiters (the experiment can just enqueue a call to ping_recruiters with no arguments, just like the clock does).

I think this model will help avoid race condition problems, since each execution of ping_recruiters will fetch the latest participant statuses from the recruitment service, and I assume the queue gets processed FIFO, so it will always be more current info overwriting older info(?) Tell me what I'm missing. :-)

pmcharrison commented 6 months ago

I agree that that kind of race condition would be harmless; I mean instead potential deadlocks between different Postgres transactions where both try to update the same participant row.

thomasmorgan commented 6 months ago

My understanding is that deadlocks are avoided by sqlalchemy/postgres which locks rows and serializes access.

I think there remains a race condition though: If a ppt arrives and pings the recruiters, the check will get queued, but if the participant tries to make a node before the check completes we can get issues. I'm not sure how severe this is, as the check should take <1s and participants have to click through stuff to get to node creation. But if an experiment tried to make a node as soon as the ppt was created, we could have issues, no?

pmcharrison commented 6 months ago

@thomasmorgan SQLAlchemy only locks rows if you add .with_for_update() to your query, normally it doesn't matter but sometimes it does and it can be painful!

I think there remains a race condition though: If a ppt arrives and pings the recruiters, the check will get queued, but if the participant tries to make a node before the check completes we can get issues. I'm not sure how severe this is, as the check should take <1s and participants have to click through stuff to get to node creation. But if an experiment tried to make a node as soon as the ppt was created, we could have issues, no?

Maybe this is best addressed via experiment design? e.g. give the participant instructions before creating nodes?

thomasmorgan commented 6 months ago

I agree that experiments should keep an eye on this, I am just envisaging a future where a user doesn't understand the issue, doesn't put the right checks in place, and then we have to try to figure out what they are doing wrong!

But, the norm in current experiments at least is that participants are made as soon as consent is given, and then nodes are made at the end of the instructions. Provided this pattern is continued everything will be fine!

jessesnyder commented 6 months ago

Hmm.. it's starting to sound like we're talking about blocking participant creation while we wait for an update from the recruiter. I was proposing enqueuing an asynchronous request for status updates when a participant joins. This would be a strategy for generally keeping status up to date, rather than guaranteeing that all status is up to date before creating a participant (or assigning a participant to a node). Am I misinterpreting?

thomasmorgan commented 6 months ago

We don't need to block participant creation per se. We just need to make sure all participant statuses are up to date before the incoming participant tries to make a node and join a network. That said, I guess, blocking the participant creation while the status updates occur is one solution.

jessesnyder commented 6 months ago

This sounds really complicated. Why isn't this an issue with the MTurk recruiter? We only get asynchronous push notifications via AWS SNS for this service. (Or maybe it's always been an issue with MTurk also?)

Taking this to its logical conclusion: we would not check participant status in the Dallinger database at all to determine active participant counts. Instead, we'd block and call the recruitment services whenever a participant joins.

thomasmorgan commented 6 months ago

Its not a problem with MTurk because MTurk sends returned/abadoned notifications before the replacement participant arrives, so the database is always up to date.

In the past, these notifications from AWS did fail to arrive sometimes, and it caused huge issues.

When a participant returns or abandons, we need to know so we can delete their data before their replacement turns up. AWS tells us via a notification, Prolific does not. So for prolific we need to find a way to do this ourselves, and critically we need to do it in the small window before the replacement participant tries to join a network.

Does this make sense?

jessesnyder commented 6 months ago

Makes sense. It seems like its the extremely eager participant replacement in Prolific that's the core issue.

An idea: Instead of having the create_participant route in Dallinger be directly responsible for tallying up active participants, the route could instead delegate this task to the recruiters with a blocking query. MTurk and built-in recruiters (HotAir and friends) could perform a query of the local DB, since that's a trusted source of truth for them, but Prolific could call out to the mother ship.

thomasmorgan commented 6 months ago

Yes, it is amazing how quickly prolific can pipe in a replacement.

Having the tallying code in the recruiter is a good idea as it will vary by recruiter, and yeah, having a blocking query for prolific sounds good.

jessesnyder commented 6 months ago

@pmcharrison @polvanrijn Thoughts on this idea? (starting here)

pmcharrison commented 6 months ago

Do you forsee a problem if many participants hit the same blocking route at the same time?

jessesnyder commented 6 months ago

Seems quite possible. (Off to the dentist... Maybe brilliance will strike while in the chair 🦷 -> 💡)

pmcharrison commented 6 months ago

Good luck at the dentist!

Yes, to me it seems undesirable to make one of our own HTTP requests dependent on a blocking API call. I think it's better practice to serve such requests as soon as possible... Otherwise server performance could slow significantly.

I think it would be safest for now to trigger the check asynchronously and make it the experimenter's obligation to give the participant something to do before creating nodes etc.

jessesnyder commented 6 months ago

make it the experimenter's obligation to give the participant something to do before creating nodes etc

@thomasmorgan Can make do within these constraints?

thomasmorgan commented 6 months ago

@jessesnyder yes, this works for me.

I actually think, even if this does become an issue, its actually going to self correct. If a participant tries to make a node before the database is updated, they'll be kicked out, but this will cause another replacement to arrive, by which time the database will have updated and the new replacement can be put in fine.