How to publish reproducible manuscripts?

corinalogan commented 7 years ago

From the initial conversation on twitter: https://twitter.com/LoganCorina/status/872833978495115264

@gmbecker: Depending on what you mean, ensuring/enforcing that manuscripts are actually reproducible is really hard in general case. I know about dynamic docs. (subject of my thesis, in fact), but an Rmd/Rnw isn't reproducible by itself, see http://biostats.bepress.com/bioconductor/paper2/ … Red files don't include data and the exact versions of packages used by original authors, so while u have code, u may not be able to run it.

@chartgerink: well if data are publicly available you can download the data based on a doi so you can ensure provenance and replicate with just rmd.

@lgatto: Yes, if... but the Rmd as such doesn't guarantee it. As it doesn't guarantee that the required software is available. But Rmd + good practice goes a long way to enable reasonable repro in many cases, I think. I am testing submission and handling Rmds with @F1000Research (should submit in a couple of weeks). Will let you know.

@chartgerink: I am amazed with Rmd and the HTMLs it turns out, I can imagine building on that would provide a good way to start. Using Rnotebooks would be even better (show/hide code user). Implement plugins (pubpeer,hypothesis). Easy to include notice of dependencies in manuscript (what I do now already; e.g. last page http://bit.ly/2s8zMlD). absolute direct reproducibility not feasible (nor doable), what we need to know is limits of. With Rmd (example) we can do much. my thoughts on this: automate version testing, determine dependencies upon sharing, look at other versions producing same or diff result. subsequently, we can produce crosstable that actually indicates when results differ from the original. How they differ is something dif.

lgatto commented 7 years ago

Here's also a very interesting article, TechBlog: C. Titus Brown: Predicting the paper of the future, that I think is relevant for this issue, but more generally for this initiative.

chartgerink commented 7 years ago

Thanks @lgatto!

With the fear of clogging this discussion and including many various aspects (we'll have to break it up into issues), here are some of my initial thoughts.

Reviews should only be concerned with whether the science done is "scientifically minimally viable" and not whether it is of "high quality". As such, reviewers should only evaluate what is done, not what they think should have been done (that is part of the discussion that is more appropriate for post-publication peer review) * idea is to have reviewers review only intro and methods, shortens effort as well and prevents evaluating quality of methods based on results (i.e., prevents publication bias which is a massive problem)
We should be specific in determining what will be published. We don't need to do everything, because then we'd have to start compromising on reproducibility. Suggestion: if the author thinks it feasible to write fully reproducible (for starters Rmd or also others?) we accept it as submission.
It needs to be so reproducible, that the journal side can actually run everything at some point! (initially, we do this as a check and correspond with author, this would produce a reproducibility review, AFTER "scientifically minimally viable" review).
Turnover time needs to be low. E.g., reviewers have to get back within two weeks (collect data to see whether this is feasible). Reproducibility review will be done as fast as possible (also depends on author I guess).

lgatto commented 7 years ago

I would like to be cautious with 'journal side can actually run everything'. In my field, computational biology, it is often difficult to run everything because of software environment and requirement of high performance computing for long/big jobs. In such cases, presenting the code/script and environment used to execute the long computation and provide the result data should be good enough. Similar difficulties can arise when the data is particularly big, and would take a lot of time to be downloaded/accessed, for example. I would be happy with a short reproducible example, that demonstrates the method, without fully re-running the full analysis on the large data.

chartgerink commented 7 years ago

You're right @lgatto. It also kind of contradicts what I said on Twitter before (that full reproducibility is not feasible/doable over time and also with certain computations). My bad! I was going on a utopian trip there, maybe.

So, it needs to document where the results come from, as produced by the author, yes? Reproducibility review would be great to finetune this and learn what can and cannot be asked from authors in this regard.

lgatto commented 7 years ago

Yes, I would suggest that we want to be able to reproduce/repeat the calculation/analyses when possible/convenient, otherwise, a detailed documentation of input, computation and outputs with short and reproducible example is perfectly acceptable, IMHO.

corinalogan commented 7 years ago

@chartgerink don't worry about clogging things up - that's what this space is for. If you want to start new issues (or anything else), please feel free to do so! I don't need to be the leader/owner here, I just want the discussion to happen and to have an ideal place to publish my research that improves the rigor!

corinalogan / CuttingEdgeOAjournal

How to publish reproducible manuscripts? #3