Ocean-Dynamics-Group / GroupExpectations

A group expectations document.
3 stars 2 forks source link

Reproducible research section is out-of-date #9

Open wenegrat opened 2 years ago

wenegrat commented 2 years ago

At the very least we should reference this group, but more broadly this is perhaps an opportunity to think about what our group expectations ought to be around reproducible research.

In some sense a minimum set of standards is set by the journals we publish in, which for the most part just requires that code is publicly available upon publication. Some journals still accept github repos, but others are requiring doi for code.

We have talked in the past about how prescriptive we ought to be here (probably not terribly), but at the same time I think it is reasonable to expect we will all follow some (TBD) minimum set of best practices (in the interest of both reproducibility and good science).

Interested in thoughts from @tomchor @rwegener2 @reint-fischer

tomchor commented 2 years ago

That's definitely a hard question to answer. I feel we'd be better positioned to answer it after we try out this github approach for a little while, since the answer most likely will use some it its tools in one way or another.

That said, I agree that we should mention this github organization for now at least. But other than that I don't have any insights at the moment.

reint-fischer commented 2 years ago

I think @rwegener2 had some good categories to think about in terms of reproducibility, maybe we could use some of those as subheadings? There are three of those that directly come to mind:

  1. Code availability. At the moment the document already says a GitHub repository is a minimum, which I think is reasonable. Best practices I which I think would be helpful is how to structure such a repository, and what journals typically require. For my first paper, a statement indicating the GitHub repo was enough, but if there are tips on how to make a DOI for your code that would be helpful.
  2. Data availability. I just googled the UMD data policies and services and found this library service DRUM. I think they should be able to store and share the data you need to run the code you use for a paper. I see they also offer storing the code. Has anyone used this before? It would be good to discuss whether/when this is expected. Are there projects for example where the funding agency expects you to store the data somewhere else? Maybe we could mention having data on HALO in this section, or even encouraging cloud-storage?
  3. Code readability. I think this is the most challenging and as Tomas said, maybe we need some time to see how the code reviews go.
rwegener2 commented 2 years ago

Sorry for the late response here. I agree with the previous comments.

Looking at the current text, if the core focus is on reproducibility then I think the text seems alright the way it is. It is a bit general and not highly prescriptive, but as @tomchor pointed out we don't have strong specifics there yet. It perhaps could be made a stronger by adding some of the categories @reint-fischer mentioned, or offering up some resources (example) for reproducibility. I'm happy to type that up into a quick PR if we want.

One of my takeaways reading it is that it is framed around reproducibility and not about openness. In my own life my motivations for reproducibility are driven by a belief in open science. Reproducibility is a result of that belief, so I would frame this section around open science and then list subheadings under that. That is a group discussion, though.

I'd propose to revisit this at the end of the semester and, while I'm sure we'll still be developing our coding practices, make a goal to put in whatever we have at the end of May as a more concrete starting point.