codeisscience / manifesto

Code is Science - a manifesto for anyone who deals with code in a scientific scenario
https://codeisscience.github.io/manifesto/
Creative Commons Zero v1.0 Universal
57 stars 14 forks source link

All code is not equal #17

Open khinsen opened 6 years ago

khinsen commented 6 years ago

I very much like the spirit of this manifesto! But as so often, the devil is in the details.

My main problem with the current version is that there are different types of code involved in doing science, but the principles cited in the manifest cannot be applied straightforwardly to all of them.

Here is an illustration of how I see the typical scientific software stack: software-stack.pdf

I'd say that the top three layers are in the domain of the manifesto, so let's go through the principles one by one and see how they fit:

  1. Open over closed

Makes sense for all layers, but "released by the time of publication" makes sense only for the top one, assuming that "publication" refers to a scientific paper. The lower layers evolve independently on any specific paper.

  1. Code for the future

As a principle this is fine for everything, but "testing, writing documentation, instructions on how to run and maintain your code" is not always reasonable or practical for the top layer. Nobody maintains project-specific workflows or notebooks today, and it isn't clear that this is a practice one could reasonably move towards unless scientific papers become less numerous and more substantial. Testing is also of very limited interest for code that computes things that have never been computed before.

  1. Incorrect code results in incorrect science

"Code published in journals should be peer reviewed." I'd say that all scientific code should be peer reviewed. For the top layer, this should be part of reviewing the scientific paper, because the review must also check if the code actually does what the paper says. But this requires changes in the review process that are not obvious to implement. For example, the experience of ReScience suggests that effective code review requires rapid interaction between authors and reviewers referring to a common codebase.

For the bottom two layers, code review needs to be continuous as the code evolves, meaning that it must be separate from any journal publication. There is no infrastructure for this at all at this time. Is it reasonable in a manifesto to call for something that is impossible in the immediate future? Honest question, I don't know how pragmatic manifestos should be.

  1. Availability over perfection

This mostly applies to the top layer. The further down the stack you move, the more professionalism can and should be expected.

  1. Code deserves credit

Certainly, but how far down the stack should one cite? For the top two layers, the obligation seems obvious. But should you cite NumPy? BLAS? Python? zlib? gcc? Linux?

npscience commented 6 years ago

I found this review really useful, in particular:

the experience of ReScience suggests that effective code review requires rapid interaction between authors and reviewers referring to a common codebase.

This is valuable insight, thank you @khinsen.