Write text - Githubissues

fsolt commented 3 years ago

[x] intro: the problem (how to make this a bigger issue)
[x] descriptive
[x] regression
[x] conclusion

fsolt commented 3 years ago

intro: difference between secondary and primary replication? more attention to reliability of ~data generation process~ data preparation. Very basic starting place--get your data right. Don't enter stuff by hand! Data quality. The more data you have, the less likely mistakes will matter, but when data is sparse, even a few small mistakes can matter a lot.

Look for similar concerns in physical and biological sciences?

Make sure description is correct before moving to causal analysis (see https://journalqd.org)

Article on computational social science by Gary King and others

Conclusion: things to do to avoid this problem (check out "Replication, Replication" for model). One: automate as much as possible (rather than manual entry). Another: replication data should start from raw source (rather than showing only last intermediate step)

Tyhcass commented 3 years ago

Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R., ... & Johnson, V. E. (2018). Redefine statistical significance. Nature human behavior, 2(1), 6-10. "The lack of reproducibility of scientific studies has caused a growing concern over the credibility of claims of new discoveries based on ‘statistically significant’ findings. There has been much progress toward documenting and addressing several causes of this lack of reproducibility (for example, multiple testing, P-hacking, publication bias and under-powered studies)" They emphasized low statistical standards, but we can cite them and emphasize the necessity of reproducibility. @sammo3182

Tyhcass commented 3 years ago

Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T. H., Huber, J., Johannesson, M., ... & Wu, H. (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour, 2(9), 637-644.

National Academies of Sciences, Engineering, and Medicine. (2019). Reproducibility and replicability in science. National Academies Press. ------ Mentioned codekeeping, version control, workflow management.

Amrhein, V., Trafimow, D., & Greenland, S. (2019). Inferential statistics as descriptive statistics: There is no replication crisis if we don’t expect replication. The American Statistician, 73(sup1), 262-270.

fsolt commented 2 years ago

Also this, that I have in an open tab, probably because one of you sent it to me: McMann, K., Pemstein, D., Seim, B., Teorell, J., & Lindberg, S. (2021). "Assessing Data Quality: An Approach and An Application." Political Analysis, 1-24. doi:10.1017/pan.2021.27

fsolt commented 2 years ago

@sammo3182 Unfortunately, DGP doesn't refer to what we're talking about. See: https://stats.stackexchange.com/a/451230/45287

I think we're talking more about data wrangling/'janitor work'. I'll revise accordingly.

fsolt commented 2 years ago

Conclusion/suggestions:

automate to minimize hand entry
hand-entered data needs double-checked ("cross-checking for data validation"?)
Github version control

fsolt / dem_mood

Write text #4