Draft something on open science and retractions

Daniel-Mietchen commented 5 years ago

I'd planned to do this for some time but was lacking data around retractions. Now, with the Retraction Watch Database live along with a first set of analyses of its content, this obstacle is significantly reduced, so such a write-up is becoming just a matter of finding the time to explore these resources and to assemble some thoughts on the implications for open science.

The gist of what I'd like to reason about:

classical research is basically a black box hiding the details of the research process; albeit not necessarily intended (or even perceived) that way
- formal publications are the main — and often the only — window into the process (though some additional windows in the form of datasets or code are opening up more regularly now than they did some years ago), and they usually come with delays of months or years after the reported research was actually performed;
- a key element of such formal publications are the conclusions, which basically means digesting the content of the black box for those who have no access to it, and providing some basic plausibility arguments in the form of the methodological and results sections;
- formal publications are limited in space and detail; a good chunk of what has actually been done is not reported in there (even — or sometimes especially — if that would affect the conclusions), and what is being reported basically has to be taken at face value (i.e. trusted), since others usually have no straightforward way (and often none at all) to reproduce in detail how the reported results and conclusions came about;
- if, in hindsight, some elements of what was reported in the formal publications turn out to be problematic, that trust is broken (or at least perceived to be so, which for most practical purposes means the same), and while it remains popular to ignore such problems or to cover them up, "efforts to stamp out bad science" are kicking in ever more regularly, which may lead to retractions of entire publications or parts thereof;
- as to whether fraud was involved, the retraction notice may or may not contain a statement about that (and many retractions still come without an accompanying retraction note explaining the specifics), and that statement too usually has to be taken at face value, since details are usually not public.
  1. in open science, the research process is shared at high temporal and methodological granularity
- this means that individual steps within the research process — as well as the corresponding results — are shared as close as possible to when the research is happening — for instance, in human genome sequencing, it is standard practice to upload new sequencing data to public data repositories within 24h of the data having been generated;
- open science also means sharing the methodology in as much detail as possible (safe for cases when there are good reasons for not sharing; for instance, while a researcher may need some personal information of study participants — e.g. names, contact details, results of clinical tests — to perform the research, these pieces of information can and should not normally be shared);
- in such open contexts, trust operates in a more granular fashion in both time and methodology, i.e. if, in hindsight, parts of an open science project turn out to be problematic, they can normally be traced back (in public, i.e. by anyone who invests the time to find out) to smaller components, e.g. a bug in an analysis script, or cell lines having been contaminated on a particular day);
- in such situations, trust would be restored (or rather retained, presumably) by clearly labeling the problematic pieces and outlining how they might or did impact any results or conclusions;
- this would probably be achievable by simply updating (with a public version history) the problematic pieces (e.g. fixing the bug), rather than retracting them, or through public annotations when things cannot be fixed (e.g. when the specific charge or cell line from that particular day is not available any more);
- those annotations and fixes could then be used further — alongside things that actually did work and were properly documented — by anyone as a basis for determining whether or what fraud was involved, for follow-up studies or for educating researchers, students and others who are active in the respective field.

As a sidenote,

I assume that much of the fraud happens at later stages of a particular research process, i.e. when multiple attempts at getting results in a fraud-free or less fraudulent way have failed, and the reasons for that are not properly understood or recognized, or actively ignored;
some of the common fraud mechanisms — e.g. removing or adding data points to fit a certain narrative — would be much harder to engage in if the research process had been conducted in the open, e.g.if the data that existed until that moment of removal or addition had already been shared in public (and with a public version history, on a platform where the researcher in question does not control the timestamping).

Daniel-Mietchen commented 5 years ago

One aspect not clearly stated above but important nonetheless is that problems can in principle be detected sooner (and often much sooner) in the open than via the black box approach, which means that the bug fix or a repetition of the procedures applied to that cell line could happen way before the end of that particular research process, thereby not only influencing conclusions but also procedure.

Daniel-Mietchen commented 5 years ago

A nice way to incentivize people to check openly documented work is via a bug bounty program. This has a certain tradition in software development, and here is an open science version of it: https://rubenarslan.github.io/bug_bounty.html . Some more background in https://rubenarslan.github.io/posts/2018-10-26-on-making-mistakes-and-my-bug-bounty-program/ .

Daniel-Mietchen / ideas

Draft something on open science and retractions #1019