We want to be able to characterize/localize the interventions from the SDC in a multidimensional space of relevant dimensions so that we can identify which interventions are similar to one another and make predictions about how interventions we haven't tested will perform based upon our knowledge of similar interventions that we have tested.
Benefitted from existing taxonomies that describe characteristics of tasks (McGrath circumplex, etc). Mostly these were just different features that various authors had thought of, and they went through and aggregated them.
Didn't do any reduction based on previous data (ie, looking at factor loadings)
Did all of their coding based on individual items, no comparisons.
Used in a JS applet (mark's task-robot on glitch) that asked the questions from the csv and paired them with different tasks. (Better than qualtrics, guarantees that it gives you a different item that you haven't seen before if you come back to do more rating tasks).
Had a specific pool of turkers (high-effort pool). Got a "gold standard" set of tasks done by lab members rating the tasks. Then gave the coders a training and a quiz, only used coders who passed the pre-quiz. Developed a quiz that could be used as a pre-screen.
Did 102 tasks, got at least 20 datapoints for each task
23 dimensions
averaged the data per task per question.
used binary answers + does not apply, deals with subjective stuff with the outcomes.
❗️Have to be very careful about distinguishing between outcomes and context. Sometimes people in the literature would say "is this a task where the best person will win" vs. "is this a task where pre-knowledge is required and there is a correct answer". They mix up causes and effects or intermediating effects. Distinguish between pure stimulus and the outcomes and intermediating variables. If you make these mistakes, they are really difficult to code for (naturally) as you can't rate them based on their base features.
Took lots of effort to curate the right questions, and then to train the raters. Adding new tasks is easy, but getting questions and raters designed right is hard.
How would you shorten the question curation? One piece is making sure that they are really concrete and can be done based on the stimulus alone. Standardize the inputs/stimulus presentation...
We want to be able to characterize/localize the interventions from the SDC in a multidimensional space of relevant dimensions so that we can identify which interventions are similar to one another and make predictions about how interventions we haven't tested will perform based upon our knowledge of similar interventions that we have tested.
The tested interventions are here: https://www.strengtheningdemocracychallenge.org/winning-interventions
Steps:
First step: