ReScience / ReScience

The ReScience journal. Reproducible Science is Good. Replicated Science is better.
https://rescience.github.io
694 stars 36 forks source link

Reproducibility of HPC replications #104

Open khinsen opened 3 years ago

khinsen commented 3 years ago

A note to @ReScience/editors: our first submission of a paper based on high-performance computations is currently under review, and raises the question of how to verify the reproducibility of work that requires exceptional computing resources.

Feel free to comment (here or there, as you see appropriate) if you have suggestions or comments on how to handle such submissions.

rougier commented 3 years ago

Actually this is the second submission. We also have the same problem with https://github.com/ReScience/submissions/issues/53.

khinsen commented 3 years ago

Thanks @rougier for the pointer! I see I was even mentioned in that discussion, and yet I never got notifications and thus never looked at the thread before.

khinsen commented 3 years ago

I see basically two issues with HPC projects:

  1. access to computational resources
  2. effort required from reviewers to do a reproduction attempt

In the submission I am handling, I have decided to address (2) by doing that part myself. More generally, the approach would be to have a third reviewer specifically working on that part. Someone who would not necessarily have the domain expertise to review the article, but who can deal with the technicalities.

That leaves the resources as an open problem. Ideally, we should try to partner with an institution who has HPC resources and is willing to let us use them. But that would still work only for modest sizes, within the typical testing allocations in HPC centres.

rougier commented 3 years ago

Another option is to trust the author that he ran simulation to get the results.

khinsen commented 3 years ago

I never had a doubt about that so far. The point of reviewing reproducibility is to check if someone else can get the same result, and thus potentially adapt the software for similar tasks.

benureau commented 3 years ago

For HPC computations that require a lot of modest-size jobs, one possibility would be for the submission to:

This can work as long as the size of the data is not too big for the scientific archives, and a single job in not too expensive in computing resources/time. It move most of the technical burden to the author, and remove the need of an expensive effort from the reviewers. It also add the benefit of providing the produced job data somewhere for analysis by researchers without access to HPC resources.

khinsen commented 3 years ago

Thanks @benureau, that sounds like something worth trying. We'd ask authors submitting HPC-style work to provide a toy version of their computation for reviewing. Ideally, running the toy version would differ only in some run-time parameter from running the real stuff, such that reviewers could more easily judge how similar the two are, by reading the workflow code.

alegrand commented 3 years ago

If the computation workload is not too large, it would make sense for the reviewers to request an access to https://www.grid5000.fr/w/Grid5000:Home, which is designed to support Open Science and reproducible research. This topic was somehow mentioned this year in a seminar given for the rebuttal process of SC. It appears like accessing resources like Xsede for such purposes is not only possible but encouraged. I'm not sure how this works in practice for non-US academics though. In any case, the authors should better have paved the way for the reviewers...

oliviaguest commented 3 years ago

Just in case this is useful to the discussion, see: https://github.com/ReScience/submissions/issues/53#issuecomment-868691108 by @NishantPrabhu. I am also certain JOSS has a way of dealing with this, by the way... if we want to tag people from there?

khinsen commented 3 years ago

JOSS experience would definitely be of interest, but I have no idea who might be involved with this. In general, I'd expect JOSS to have less of an issue with long computations, since JOSS reviews software rather than real-life use cases. But non-standard machines are probably a shared issue.

oliviaguest commented 3 years ago

You are totally correct, yup. I've asked people in the JOSS Slack, in case anybody has anything relevant to contribute (I assume they do).

diehlpk commented 3 years ago

I can provide some perspective from some reproducibility committee, I serve on. For large work loads where the run took days or weeks, it is not possible to reproduce these runs or runs with many computational nodes. However, one could reproduce some smaller runs and check to see a trend with the larger runs. For specialized hardware, the author is asked to provide access to the reviewers with some limited allocation to compile the code and do some runs. Is that is not possible, the authors could do some demonstration on Zoom and show how to produce some scaling results. Since we have some open review process, we do not have any issue with the anonymity of the reviewers.

khinsen commented 3 years ago

Thanks @diehlpk! I think it is important to consider why one is verifying reproducibility. If it's a guard against mistake or fraud by the authors, the Zoom demonstration is helpful, but if it's primarily to ensure that others can build on the work, then reproducibility must be operational. But a toy version would still be good enough, and pragmatically even more useful.

diehlpk commented 3 years ago

@khinsen The Zoom demonstration should be the last resort, if they use a very specialized hardware, e.g. FGPA as accelerator cards or Power9 architecture or some proprietary complier. In that case it can be very difficult to find reviewers having access to these things. So I think in that case the Zoom session is better than nothing. For sure all other options should be explores first.

khinsen commented 2 years ago

A preprint related to this topic: Reproducibility Practice in High Performance Computing: Community Survey Results