ReScience / submissions

ReScience C submissions
28 stars 7 forks source link

10 year reproducibility challenge: reproduction of J. Chem. Phys. 112, 4417 (2000) #11

Closed mcbaneg closed 4 years ago

mcbaneg commented 4 years ago

article.pdf

Original article: Gottfried and McBane, J. Chem. Phys. 112, 4417 (2000)

PDF URL: https://github.com/ReScience/submissions/files/3882888/article.pdf Metadata URL: https://github.com/ReScience/submissions/files/3888185/metadata.zip Code URL: https://github.com/mcbaneg/tenyears-submission (also: http://faculty.gvsu.edu/mcbaneg/virial6-resciencec.zip)

Scientific domain: Chemical Physics Programming language: Fortran Suggested editor:

khinsen commented 4 years ago

@mcbaneg Thanks for your submission! Before we start reviewing, could you please check the Metadata URL?

Note that I have edited your submission slightly. Our "PDF URL" is supposed to point to the reproduction article, not the original article.

Update: I'd better say that I tried to edit your submisson. GitHub lets me do it but what it shows is still the original text. Could you please update the URLs yourself?

mcbaneg commented 4 years ago

Okay, I have modifed the PDF link to point to the .pdf of the draft that I attached to the posting (that is, uploaded to GitHub.) I was unable to do the same with the metadata.yaml file because when I try to attach it I get a message "We don't support that file type, Try again with [list of file extensions it accepts, including .zip]" So, I made a .zip file out of metadata.yaml and uploaded that; I hope that works.

The original metadata.yaml is present at the site I originally listed, but either the web server doesn't automatically start a download when a link to a .yaml file is followed, or the browser doesn't recognize that as what it should do.

If there's a way to upload the metadata.yaml file to the GitHub submission, I'd be happy to do it to save the user the hassle of unzipping.

-G.

khinsen commented 4 years ago

Good question. I have never used the GitHub file upload feature for anything but images. Most ReScience authors create a GitHub repository for their submission, which contains all their code plus the article source code and the metadata. That makes it easy to put links to any file anywhere, and also allows reviewers and editors to propose modifications (which, as an editor, I do for the metadata in the end of the publication process). Another advantage is that GitHub repositories are automatically archived by Software Heritage, so they are safe in case of GitHub shutting down, being hackes, changing their rules, etc.

Anyway, the ZIP you uploaded for the metadata works well for me!

khinsen commented 4 years ago

@pdebuyl Would you willing to review this paper?

Note that it is a special case: the first submission to the Ten Years Challenge, written in the absence of any specific instructions from ReScience. With the consentment of the author, we plan to use this article and the review to prepare more detailed guidelines for other authors.

Also, since we expect to have many submissions to this special issue, we will try a revised reviewing procedure as well, with a single reviewer.

pdebuyl commented 4 years ago

Hi @khinsen I will review. I'll take notes along the way :-)

pdebuyl commented 4 years ago

@mcbaneg Are you familiar with git? At some point, I think that we will upload the old code and likely modifications to the old code that were needed to get it running. I can assist if necessary, by either uploading myself your contributions to github or by providing specific instructions to do so yourself.

mcbaneg commented 4 years ago

Pierre,

I do know git, though I have not used GitHub much. I’ll set up a repository.

Do you have a BLAS library handy to use, or would it help for me to put the BLAS reference sources there for you?

-George.

George C. McBane (mcbaneg@gvsu.edumailto:mcbaneg@gvsu.edu) Professor of Chemistry Assistant Dean for Research, Facilities, and Analytics College of Liberal Arts and Sciences Grand Valley State University

B-4-229 Mackinac Hall, (616) 331-2506 Chemistry site http://www.gvsu.edu/chem/ CLAS site http://www.gvsu.edu/clas/ individual site http://faculty.gvsu.edu/mcbaneg/

From: Pierre de Buyl notifications@github.com Sent: Wednesday, November 27, 2019 4:32 AM To: ReScience/submissions submissions@noreply.github.com Cc: George McBane mcbaneg@gvsu.edu; Mention mention@noreply.github.com Subject: Re: [ReScience/submissions] 10 year reproducibility challenge: reproduction of J. Chem. Phys. 112, 4417 (2000) (#11)

@mcbaneghttps://github.com/mcbaneg Are you familiar with git? At some point, I think that we will upload the old code and likely modifications to the old code that were needed to get it running. I can assist if necessary, by either uploading myself your contributions to github or by providing specific instructions to do so yourself.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ReScience/submissions/issues/11?email_source=notifications&email_token=ANZTG5WSXZQ6GHSBMKYXLR3QVY477A5CNFSM4JQ32KW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFI4ATI#issuecomment-559005773, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANZTG5Q5QDXYAC3YKTWX4PDQVY477ANCNFSM4JQ32KWQ.

mcbaneg commented 4 years ago

Pierre,

There’s now a GitHub repository, https://github.com/mcbaneg/tenyears-submission, where you are listed as a collaborator.

Only the master branch exists at the moment. The code, sample input and output files, and article source information are all there.

Best regards, George.

George C. McBane (mcbaneg@gvsu.edumailto:mcbaneg@gvsu.edu) Professor of Chemistry Assistant Dean for Research, Facilities, and Analytics College of Liberal Arts and Sciences Grand Valley State University

B-4-229 Mackinac Hall, (616) 331-2506 Chemistry site http://www.gvsu.edu/chem/ CLAS site http://www.gvsu.edu/clas/ individual site http://faculty.gvsu.edu/mcbaneg/

From: Pierre de Buyl notifications@github.com Sent: Wednesday, November 27, 2019 4:32 AM To: ReScience/submissions submissions@noreply.github.com Cc: George McBane mcbaneg@gvsu.edu; Mention mention@noreply.github.com Subject: Re: [ReScience/submissions] 10 year reproducibility challenge: reproduction of J. Chem. Phys. 112, 4417 (2000) (#11)

@mcbaneghttps://github.com/mcbaneg Are you familiar with git? At some point, I think that we will upload the old code and likely modifications to the old code that were needed to get it running. I can assist if necessary, by either uploading myself your contributions to github or by providing specific instructions to do so yourself.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ReScience/submissions/issues/11?email_source=notifications&email_token=ANZTG5WSXZQ6GHSBMKYXLR3QVY477A5CNFSM4JQ32KW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFI4ATI#issuecomment-559005773, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANZTG5Q5QDXYAC3YKTWX4PDQVY477ANCNFSM4JQ32KWQ.

pdebuyl commented 4 years ago

Thank you @mcbaneg for the update.

I will follow mostly the regular review process that you can read at https://rescience.github.io/write/ and https://rescience.github.io/edit/ , adapting of course to the specifics of the challenge.

I provide a first check list here (the first four items come from ReScience's guidelines).

I'll provide a first report this week so that we can move forward with the review.

Regarding BLAS, I installed it via the Debian package system and added -lblas to the gfortran link line :-)

mcbaneg commented 4 years ago

Pierre,

Sounds good.

One of the points on your list seems inconsistent with the Challenge guidelines:

I hope you find my code comprehensible and clear and well documented. I also hope that in the past 20 years I have learned some things about how to write clear and comprehensible programs, so that if I were to redo the project the result would be better than it was the first time. But for this Challenge I don’t feel like going in and fixing things before submitting would be appropriate. Do you agree?

-G.

pdebuyl commented 4 years ago

@mcbaneg I also believe that the requirements for the ten year challenge should be somehow relaxed.

Please find below my first report. This is a first case as it is a success and my comments revolve mostly around our publishing principles: licensing and environment information.

Replication: The scope of the results is well presented: the second virial coefficient for a mixture of two gases. Even though it might seem redundant, I suggest to add the table with the numerical values. In all ReScience papers, we include the final data in some form, table or figure.

Clarity of the code and instructions for running it: The code is written in a procedural style, which is expected for Fortran 77. The code is commented sufficiently to follow the logic.

I could run the code rather easily. I ran gfortran -c file.f for all Fortran files except for the common block file vecstorh2co.f, then gfortran -o virial6 *.o -lblas and ran the code as indicated in the README file: virial6 < az009f.inp. The instructions should of the README file are sufficient for Fortran programmers.

gfortran reports a warning when passing a character variable of the inappropriate length. This only occurs for the label of the error and does not influence any of the numerical routines.

Past source code: The source code is available. Licensing, as far as I could see from web searches, seems ok for the external code: molscat has been released under the GPL since the 90's and the IQPACK routines from netlib should be, unless otherwise stated, in the public domain.

The license for the main program is not defined, this should be fixed.

I believe that you can also comment on the choice of technology, here Fortran. This is consistent with the aims of the ten year challenge.

An extra request I have concerns the platform definition. Please state, for both the old environment (if possible) and the new: CPU type, operating system, compiler (including the version).

There was an example for Python on the previous website https://github.com/ReScience/rescience.github.io/blob/sources/07-platform-instruction and some papers provide the information https://github.com/ReScience-Archives/Senden-Schuecker-Hahne-Diesmann-Goebel-2018/tree/master/code https://github.com/ReScience-Archives/Detorakis-2017/tree/master/code

khinsen commented 4 years ago

Thanks @pdebuyl! Given your experience with this first review, could you please check if the author guidelines draft covers everything you consider important?

mcbaneg commented 4 years ago

@pdebuyl: thank you for the review. I am working on addressing your comments.

I have one question: you said

I believe that you can also comment on the choice of technology, here Fortran. This is consistent with the aims of the ten year challenge.

I had tried to do that at the end of Section 2, where I wrote

In molecular physics it was (and is) common practice for groups developing new PESs to provide Fortran subroutines implementing their surfaces as functions of the coordinates. Such a routine was available for the Jankowski and Szalewicz surface, deposited as supplementary information with their original publication. A revised version of this routine, more efficient for rapid multiple evaluations, was used for the project. The main Virial6 program was written in Fortran for compatibility with such PES routines. The IQPACK library was then selected as a convenient and well-established Fortran source of accurate numerical quadrature data.

Is that paragraph inadequately clear, or is some other information about the choice of Fortran missing?

Thank you! G.

pdebuyl commented 4 years ago

@mcbaneg the paragraph quoted informs the reader that Fortran is a common choice in your field of research. There are other criteria for picking a language:

It does not have to be a long explanation and you can pick what you wish as a motivation. As we look at "old" software, I believe that it is important to add context about our technological choices.

Regards,

Pierre

mcbaneg commented 4 years ago

@pdebuyl and @khinsen: I have modified the submission in response to the comments:

Even though it might seem redundant, I suggest to add the table with the numerical values. In all ReScience papers, we include the final data in some form, table or figure.

Done, now Table 1 in section 4.

Licensing, as far as I could see from web searches, seems ok for the external code: molscat has been released under the GPL since the 90's and the IQPACK routines from netlib should be, unless otherwise stated, in the public domain. The license for the main program is not defined, this should be fixed.

I have added a LICENSE.txt file. I did not want to assign new licenses to code produced by others so I have linked to the relevant licenses as best I could. The LICENSE.txt also includes a note explaining that the CO_H2 potential surface is the work of Jankowski and Szalewicz, and use of it in published work requires appropriate citation independent of the license that applies to the software used to implement it.

I believe that you can also comment on the choice of technology, here Fortran. This is consistent with the aims of the ten year challenge.

I have added a modest amount to the material at the end of section 2 in response.

An extra request I have concerns the platform definition. Please state, for both the old environment (if possible) and the new: CPU type, operating system, compiler (including the version).

I have added this information to the README.md file.

pdebuyl commented 4 years ago

@mcbaneg Thank you for the update, I am happy with the modifications.

Before finalizing the review, I only have two remarks:

  1. The comment about mandatory citation of the 1998 paper is, as far as I can tell, not compatible with the MIT license.
  2. You can still decide to archive your code either on zenodo or on software heritage.

@khinsen @rougier can you comment on the license issue here? Also, once the article is accepted, do you take the process over or do I need to do something?

mcbaneg commented 4 years ago

@khinsen, @pdebuyl, @rougier ,

The comment about mandatory citation of the 1998 paper is, as far as I can tell, not compatible with the MIT license.

I'd like advice on how to do this correctly. I want to distinguish between the code I wrote and the original H2-CO potential surface of Jankowski and Szalewicz that they constructed and described in their paper. Ordinary scientific ethics requires that anyone who uses the J&S surface describing H2-CO and publishes results built with it must acknowledge J&S with a citation. This is true whether they use the routine that J&S originally supplied, my program derived from it, or any other evaluation mechanism including a new one written for the purpose.

My code provides a particular implementation of their surface. A permissive license applied to my code does not relieve the user of the obligation to acknowledge J&S in publications describing use of their work. I was trying to convey that idea with the statement

The files vectorh2co.f, vecsetuph2co.f, and vecevalh2co.f implement the H2-CO potential energy surface of P. Jankowski and K. Szalewicz, Journal of Chemical Physics 108, 3554 (1998). Any published work that uses their surface must cite that paper.

at the top of LICENSE.txt.

How do I do this right?

Thank you!

khinsen commented 4 years ago

My point of view on the licensing question is that licenses and citation requirement live in different universes. LIcenses are legally binding and apply to software as a tool. Citation requirements are moral (I haven't heard of any case going to court) and apply more to methods than to implementations. The idea of citing software itself is a recent invention and (as I see it!) just a workaround to make a broken system of credit a bit less negative for software developers.

I agree with @pdebuyl that a requirement to cite anything is incompatible with the MIT license, for two reasons: 1) the MIT license has no provision for such restrictions, 2) the MIT license is about using and copying software, whereas a citation request is about publishing results obtained with the software.

So I'd keep citation out of the license but add a separate CITATION file (see https://www.software.ac.uk/blog/2017-12-12-standard-format-citation-files) to establish the moral requirements for academic use.

mcbaneg commented 4 years ago

@khinsen, @pdebuyl, @rougier ,

I have now modified my submission to try to address all the comments. In addition to the changes described in https://github.com/ReScience/submissions/issues/11#issuecomment-567787725,

  1. I have removed comments about citing Jankowski and Szalewicz from the LICENSE file, and added them as a note in the README.md. My study of the CITATION.cff format indicated that it was specifically aimed at software/code citations, not to the ordinary publication/citation mechanism.
  2. I did add a CITATION.cff file applying to Virial6 itself.
  3. I added a PLATFORM information section to README.md.
  4. I have tried to extend the description of the choice of Fortran for the project.

I have not submitted the code explicitly to either Software Heritage or Zenodo. I did not want to duplicate the automatic acquisition of the GitHub project by Software Heritage, or any further processes done by ReScience C. Once again I will welcome recommendations about what action might be best.

Thank you!

khinsen commented 4 years ago

Submitting your GitHub repository to Software Heritage will only make it archived earlier, so it can do no harm. Software Heritage does automatic deduplication in the same way that git does: via content-addressable storage. In fact, we added the recommendation to submit to Software Heritage at the suggestion of the Software Heritage team!

mcbaneg commented 4 years ago

Okay, I've now made a "save code now" request to Software Heritage.

-G.

pdebuyl commented 4 years ago

@mcbaneg @khinsen @rougier

I have no further comments or questions on the manuscript and I recommend publication.

@khinsen @rougier What's next in the process?

rougier commented 4 years ago

@khinsen will need now to actually publish the paper which is not totally straightforward yet. @mcbaneg will need to update the metadata with the information provided by @khinsen and we may need to transform the template a bit to have software heritage identifier on code (instead of DOI).

khinsen commented 4 years ago

I'll try to do all of that tomorrow. Thanks @rougier for pointing out the SWH identifier, that's indeed a new feature but one that should also be useful for future regular articles.

khinsen commented 4 years ago

The article is published: https://zenodo.org/record/3630224

Congratulations to @mcbaneg and thanks to @pdebuyl for the review!

@mcbaneg: Please accept pull request https://github.com/mcbaneg/tenyears-submission/pull/1 to your code/article repository to update the metadata and the final article PDF.