Open corinalogan opened 7 years ago
Here's also a very interesting article, TechBlog: C. Titus Brown: Predicting the paper of the future, that I think is relevant for this issue, but more generally for this initiative.
Thanks @lgatto!
With the fear of clogging this discussion and including many various aspects (we'll have to break it up into issues), here are some of my initial thoughts.
I would like to be cautious with 'journal side can actually run everything'. In my field, computational biology, it is often difficult to run everything because of software environment and requirement of high performance computing for long/big jobs. In such cases, presenting the code/script and environment used to execute the long computation and provide the result data should be good enough. Similar difficulties can arise when the data is particularly big, and would take a lot of time to be downloaded/accessed, for example. I would be happy with a short reproducible example, that demonstrates the method, without fully re-running the full analysis on the large data.
You're right @lgatto. It also kind of contradicts what I said on Twitter before (that full reproducibility is not feasible/doable over time and also with certain computations). My bad! I was going on a utopian trip there, maybe.
So, it needs to document where the results come from, as produced by the author, yes? Reproducibility review would be great to finetune this and learn what can and cannot be asked from authors in this regard.
Yes, I would suggest that we want to be able to reproduce/repeat the calculation/analyses when possible/convenient, otherwise, a detailed documentation of input, computation and outputs with short and reproducible example is perfectly acceptable, IMHO.
@chartgerink don't worry about clogging things up - that's what this space is for. If you want to start new issues (or anything else), please feel free to do so! I don't need to be the leader/owner here, I just want the discussion to happen and to have an ideal place to publish my research that improves the rigor!
From the initial conversation on twitter: https://twitter.com/LoganCorina/status/872833978495115264
@gmbecker: Depending on what you mean, ensuring/enforcing that manuscripts are actually reproducible is really hard in general case. I know about dynamic docs. (subject of my thesis, in fact), but an Rmd/Rnw isn't reproducible by itself, see http://biostats.bepress.com/bioconductor/paper2/ … Red files don't include data and the exact versions of packages used by original authors, so while u have code, u may not be able to run it.
@chartgerink: well if data are publicly available you can download the data based on a doi so you can ensure provenance and replicate with just rmd.
@lgatto: Yes, if... but the Rmd as such doesn't guarantee it. As it doesn't guarantee that the required software is available. But Rmd + good practice goes a long way to enable reasonable repro in many cases, I think. I am testing submission and handling Rmds with @F1000Research (should submit in a couple of weeks). Will let you know.
@chartgerink: I am amazed with Rmd and the HTMLs it turns out, I can imagine building on that would provide a good way to start. Using Rnotebooks would be even better (show/hide code user). Implement plugins (pubpeer,hypothesis). Easy to include notice of dependencies in manuscript (what I do now already; e.g. last page http://bit.ly/2s8zMlD). absolute direct reproducibility not feasible (nor doable), what we need to know is limits of. With Rmd (example) we can do much. my thoughts on this: automate version testing, determine dependencies upon sharing, look at other versions producing same or diff result. subsequently, we can produce crosstable that actually indicates when results differ from the original. How they differ is something dif.