Open khinsen opened 3 years ago
Actually this is the second submission. We also have the same problem with https://github.com/ReScience/submissions/issues/53.
Thanks @rougier for the pointer! I see I was even mentioned in that discussion, and yet I never got notifications and thus never looked at the thread before.
I see basically two issues with HPC projects:
In the submission I am handling, I have decided to address (2) by doing that part myself. More generally, the approach would be to have a third reviewer specifically working on that part. Someone who would not necessarily have the domain expertise to review the article, but who can deal with the technicalities.
That leaves the resources as an open problem. Ideally, we should try to partner with an institution who has HPC resources and is willing to let us use them. But that would still work only for modest sizes, within the typical testing allocations in HPC centres.
Another option is to trust the author that he ran simulation to get the results.
I never had a doubt about that so far. The point of reviewing reproducibility is to check if someone else can get the same result, and thus potentially adapt the software for similar tasks.
For HPC computations that require a lot of modest-size jobs, one possibility would be for the submission to:
This can work as long as the size of the data is not too big for the scientific archives, and a single job in not too expensive in computing resources/time. It move most of the technical burden to the author, and remove the need of an expensive effort from the reviewers. It also add the benefit of providing the produced job data somewhere for analysis by researchers without access to HPC resources.
Thanks @benureau, that sounds like something worth trying. We'd ask authors submitting HPC-style work to provide a toy version of their computation for reviewing. Ideally, running the toy version would differ only in some run-time parameter from running the real stuff, such that reviewers could more easily judge how similar the two are, by reading the workflow code.
If the computation workload is not too large, it would make sense for the reviewers to request an access to https://www.grid5000.fr/w/Grid5000:Home, which is designed to support Open Science and reproducible research. This topic was somehow mentioned this year in a seminar given for the rebuttal process of SC. It appears like accessing resources like Xsede for such purposes is not only possible but encouraged. I'm not sure how this works in practice for non-US academics though. In any case, the authors should better have paved the way for the reviewers...
Just in case this is useful to the discussion, see: https://github.com/ReScience/submissions/issues/53#issuecomment-868691108 by @NishantPrabhu. I am also certain JOSS has a way of dealing with this, by the way... if we want to tag people from there?
JOSS experience would definitely be of interest, but I have no idea who might be involved with this. In general, I'd expect JOSS to have less of an issue with long computations, since JOSS reviews software rather than real-life use cases. But non-standard machines are probably a shared issue.
You are totally correct, yup. I've asked people in the JOSS Slack, in case anybody has anything relevant to contribute (I assume they do).
I can provide some perspective from some reproducibility committee, I serve on. For large work loads where the run took days or weeks, it is not possible to reproduce these runs or runs with many computational nodes. However, one could reproduce some smaller runs and check to see a trend with the larger runs. For specialized hardware, the author is asked to provide access to the reviewers with some limited allocation to compile the code and do some runs. Is that is not possible, the authors could do some demonstration on Zoom and show how to produce some scaling results. Since we have some open review process, we do not have any issue with the anonymity of the reviewers.
Thanks @diehlpk! I think it is important to consider why one is verifying reproducibility. If it's a guard against mistake or fraud by the authors, the Zoom demonstration is helpful, but if it's primarily to ensure that others can build on the work, then reproducibility must be operational. But a toy version would still be good enough, and pragmatically even more useful.
@khinsen The Zoom demonstration should be the last resort, if they use a very specialized hardware, e.g. FGPA as accelerator cards or Power9 architecture or some proprietary complier. In that case it can be very difficult to find reviewers having access to these things. So I think in that case the Zoom session is better than nothing. For sure all other options should be explores first.
A preprint related to this topic: Reproducibility Practice in High Performance Computing: Community Survey Results
A note to @ReScience/editors: our first submission of a paper based on high-performance computations is currently under review, and raises the question of how to verify the reproducibility of work that requires exceptional computing resources.
Feel free to comment (here or there, as you see appropriate) if you have suggestions or comments on how to handle such submissions.