Closed fsolt closed 2 years ago
intro: difference between secondary and primary replication? more attention to reliability of ~data generation process~ data preparation. Very basic starting place--get your data right. Don't enter stuff by hand! Data quality. The more data you have, the less likely mistakes will matter, but when data is sparse, even a few small mistakes can matter a lot.
Look for similar concerns in physical and biological sciences?
Make sure description is correct before moving to causal analysis (see https://journalqd.org)
Article on computational social science by Gary King and others
Conclusion: things to do to avoid this problem (check out "Replication, Replication" for model). One: automate as much as possible (rather than manual entry). Another: replication data should start from raw source (rather than showing only last intermediate step)
Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R., ... & Johnson, V. E. (2018). Redefine statistical significance. Nature human behavior, 2(1), 6-10. "The lack of reproducibility of scientific studies has caused a growing concern over the credibility of claims of new discoveries based on ‘statistically significant’ findings. There has been much progress toward documenting and addressing several causes of this lack of reproducibility (for example, multiple testing, P-hacking, publication bias and under-powered studies)" They emphasized low statistical standards, but we can cite them and emphasize the necessity of reproducibility. @sammo3182
Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T. H., Huber, J., Johannesson, M., ... & Wu, H. (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour, 2(9), 637-644.
National Academies of Sciences, Engineering, and Medicine. (2019). Reproducibility and replicability in science. National Academies Press. ------ Mentioned codekeeping, version control, workflow management.
Amrhein, V., Trafimow, D., & Greenland, S. (2019). Inferential statistics as descriptive statistics: There is no replication crisis if we don’t expect replication. The American Statistician, 73(sup1), 262-270.
Also this, that I have in an open tab, probably because one of you sent it to me: McMann, K., Pemstein, D., Seim, B., Teorell, J., & Lindberg, S. (2021). "Assessing Data Quality: An Approach and An Application." Political Analysis, 1-24. doi:10.1017/pan.2021.27
@sammo3182 Unfortunately, DGP doesn't refer to what we're talking about. See: https://stats.stackexchange.com/a/451230/45287
I think we're talking more about data wrangling/'janitor work'. I'll revise accordingly.
Conclusion/suggestions: