Open markwhiting opened 1 year ago
7 sources of claims: Category prompt, Situation prompt, ConceptNet, Atomic, News media, Campaign emails, Aphorisms.
Update to talk about: direct elicitation, in-the-wild use, corpus (which we will probably deemphasize in the future), and have GPT as an additional construct here.
We aim to sample at least 100 participants per design point. We intend to stop sampling when our stabilizes for new design points — when adding more training data doesn't improve accuracy in an out of sample prediction of a new design point.
We should have a state a goal but also add discussion of future batches, i.e., that we might find a better sample size and adjust accordingly.
Shift to do all points at start.
Collect more points on the back end.
A preregistration based on the As Predicted template.
Registration
Data collection. Have any data been collected for this study already?
Hypothesis What's the main question being asked or hypothesis being tested in this study?
What types of claims are the most commonsensical, given a taxonomy of claims? Our existing hypothesis reflect those in our existing analysis: https://osf.io/9kxt2/
Dependent variable Describe the key dependent variable(s) specifying how they will be measured.
Metrics defined https://osf.io/9kxt2/
With the addition of:
Conditions How many and which conditions will participants be assigned to?
Conditions are design points in the space of possible statement types (not all of which will be sampled):
Social vs Physical
,Everyday vs Abstract
,Figure of speech vs Literal language
,Normative vs Positive
,Opinion vs Factual
andKnowledge vs Reasoning
.General reference
,Culture and the arts
,Geography and places
,Health and fitness
,History and events
,Human activities
,Mathematics and logic
,Natural and physical sciences
,People and self
,Philosophy and thinking
,Religion and belief systems
,Society and social sciences
, andTechnology and applied sciences
.Category prompt
,Situation prompt
,ConceptNet
,Atomic
,News media
,Campaign emails
,Aphorisms
.2^6 13 7 = 5,824 total design points.
The stimulus for each design point will be a single set of 15 statements randomly sampled from an updated version of the corpus in https://osf.io/9kxt2/. If design points don't contain enough statements, new statements will be generated with a language model. A date stamped version of the corpus, design point samples, and acquisition pipeline is available at https://github.com/Watts-Lab/commonsense-statements.
Analyses Specify exactly which analyses you will conduct to examine the main question/hypothesis.
Outliers and Exclusions Describe exactly how outliers will be defined and handled, and your precise rule(s) for excluding observations.
We will exclude data of participants who provide incomplete responses or fail to meet attention checks in the survey tool.
Sample Size How many observations will be collected or what will determine sample size?
We aim to sample at least 100 participants per design point. We intend to stop sampling when our $Q^2$ stabilizes for new design points — when adding more training data doesn't improve accuracy in an out of sample prediction of a new design point.
Other Anything else you would like to pre-register?
We intend to make registrations of predictions for each design point before sampling it.
Name Give a title for this AsPredicted pre-registration
World scale evaluation of common sense
Type of study.
Data source