Automated data entry - Githubissues

R1:

The introduction and conclusion seems based on the premise that it is always preferable to automate manual tasks. But this seems questionable. The claim is unproven and deserves more systematic discussion. For instance, manual tasks indeed run the risk of unconscious, idiosyncratic errors. But automation runs the risk of systematic errors. For an overall judgment of what is preferable a balanced discussion of all potential error types would be necessary. It is also not clear that the presented evidence gives compelling reason to believe the superiority of automated data entry even in this case because we do not know whether the automated data are valid.

Specifically, the tension concerns the article’s framing on manual vs automated data entry in general terms and the article’s very specific content (reporting a reproduction of one article in empirical democracy research). The actual content is much narrower and more specific than the framing implies. The framing makes the reader expect a comprehensive treaty of the pros and cons of manual vs automated data entry, perhaps a systematic review of available literature etc but all of this is missing. The study makes it seem as if automated data entry is always preferable to manual entry but this conclusion seems due to the superficiality of the discussion. I believe a more balanced, comprehensive treaty would reveal trade-offs and contexts where the one works better than the other because, to name one point, automated processes come with lower levels of control and lower levels of researcher awareness of what is actually happening under the hood of the automated processes.

Altogether, one way forward to alleviate the described problems could be to take the issue manual data entry seriously in theoretical respect. But expanding these discussions could run the risk of artificially expanding the manuscript. Another path could be to be more modest in the framing and conclusion and treating this study as an exploratory case study with the inductive value of giving us reason to think more systematically about data vs automated data entry and test these questions in an future research agenda.

In a similar vein, the recommendations given in the discussion section sound reasonable but stand on thin ground as the empirical evidence for the superiority of each suggestion over its alternatives is weak. Rather than putting us in a position to make general suggestions, I would interpret the one case study presented in this manuscript as giving rise to an interesting and innovative research agenda on the (presumed) importance of manual data entry. As part of that research agenda, each of these suggestions could be empirically tested on a greater set of studies

R2:

Second, I am convinced that the argument about the relevance of data-entry-errors and “data janitor” work can be better linked to the emerging literature in political science that advocates using open science principles. Open science advocates argue that the entire elements of research work should be made publicly available (e.g. all data, software, documentation, etc.). The author’s suggestion to “automate data entry” (p. 9) in the discussion relates to this argument.

[x] Clarify that this is an illustrative piece not a "systematic review"
[x] Highlight what problem the automated data entry is supposed to solve (measurement error caused by human miscoding) and what it doesn't
[x] Use the open science lit to back up the value of automatic data entry

fsolt / dem_mood

Automated data entry #10