UBC-MDS / opinionated-practices-for-teaching-reproducibility

https://arxiv.org/abs/2109.13656
2 stars 0 forks source link

Consider adding personal stories from the trenches #12

Closed joelostblom closed 2 years ago

joelostblom commented 3 years ago

Recording PR this comment to potentially address during review or a second version of the MS in the future

I think we should add a couple stories of ours from the trenches. Some examples I can think of are:

  • changing the code, not having it version controlled, and not being able to regenerate a figure anymore
  • Figures changing when code is run on different versions of R, because how the random number generator works changed between versions (this was version 3.6)

We could use these or one of them, and then you could add one of your own?

joelostblom commented 2 years ago

@ttimbers Do you already have written down examples of this from the recent talk/lecture you gave when you asked us on slack for our stories?

ttimbers commented 2 years ago

For sure - I paste them below here:

As a Masters student, I started to use R to do my statistical analysis. I obtained the results I needed from running my code in the R console and copying the results into the word document that was my manuscript. Six months later we were working on revisions requested by the reviewers and I could not remember which version of the code I ran to get my results. I eventually figured it out through much trial and error, but the process was inefficient and very stressful.

–Tiffany Timbers

As a Masters student, I spent many hours customizing the colors of a figure that had many points and lines on it using Adobe Illustrator. The next week in a meeting with my supervisor, she very correctly pointed out that the color scheme I chose would be problematic for color-blind people. I then had to spend many more hours repeating the same work to change the colors of all the lines and points to fix it. It was very frustrating to repeat such tedious and time consuming work, and I was not able to meet other deadlines on time because I had to redo this.

-- Tiffany Timbers

I was involved in a project which I needed to return to after a long absence, but dreaded because I had not setup a way to track changes in the analyzed data with the changes in the code. This happened because the data was too big for a simple solution like GitHub, and I was under time pressure to produce results and didn't prioritize looking into an appropriate solution. So although we were using version control for the code in this project (and this was still helpful), we didn't keep track of which input/output was analyzed/produced with which version of the code. So when I returned to this project, I could not build on what I had done previously, and instead had to re-analyze all the data with the latest version of the code to reduce the chance for issues from using conflicting code bases. As often is the case in projects where code is only seen as a means to an end and not part of the final product, there was also no time dedicated to write tests for the code in this project, so there was no guarantee that there were not unintended side effects introduced when new changes were made.

-- Joel Ostblom