UBC-MDS / opinionated-practices-for-teaching-reproducibility

https://arxiv.org/abs/2109.13656
2 stars 0 forks source link

Improve flow, update intro, add habits sections, fix typos #7

Closed joelostblom closed 2 years ago

joelostblom commented 2 years ago

Close #3, close #5, close #6, close #8

joelostblom commented 2 years ago

@ttimbers Here are some suggestions, including three of the issues you opened, some high level tweaks to the flow and adding a section on habits (I have refs but did not add all of them yet). Let me know if you think the habit sections fits or if it is too speculative/unusual in this context.

ttimbers commented 2 years ago

I like the introduction to the habit section! Great job!

I still think the detailed examples should go at the end of each section (for example, you moved "Detailed example lesson of letting them fail (in a controlled manner)", I think).

Happy for you to do just a little bit more work on this pass, and we can merge this PR, and then I'll take another stab at editing the writing.

ttimbers commented 2 years ago

Or if you think it makes more sense you address the other issues first before I dig into the writing a bit, please let me know!

joelostblom commented 2 years ago

@ttimbers I finally added the section on case studies, sorry for the delay, it is all yours to review now. These are the papers I used and I will add proper references after you have reviewed in more detail.

ttimbers commented 2 years ago

There's a typo in this figure, we need to fix it:

Screen Shot 2021-09-10 at 10 22 14 AM

We should also keep the source for this in this repo in case we need to tweak it further...

ttimbers commented 2 years ago

I think we should add a couple stories of ours from the trenches. Some examples I can think of are:

We could use these or one of them, and then you could add one of your own?

ttimbers commented 2 years ago

I removed this paragraph because I think it doesn't quite fit where it currently was:

While few researchers and analyst willingly manipulate their data, there are currently few direct incentives connected to the additional work of making a workflow reproducible after the fact, and it is easier to keep the analysis in an inaccessible format so that others can't easily find errors in it. This inclination is dangerous and we should in fact try to make our work as accessible as possible so that any potential errors can be found early on by our colleagues, when they have the lowest possible downstream consequences. We believe the most effective way of making reproducible practices the norm is to teach it early so that students to the right things by default rather than an after thought before archiving important work. Practicing these skills will also normalize the discovery and discussion of mistakes in our own and others' work and increase the empathy we show ourself and others when such errors are revealed so that we can rejoice in that their early discovery presented large-scale downstream consequences.

Maybe we can find a place for it somewhere else? I like the case studies that you added, but I think the first draft of the section was a bit too detailed. I have tried to make it more concise and focus it towards case studies are good for highlighting significant real world consequences (and stories from the trenches make reproducibility errors more relatable).

ttimbers commented 2 years ago

I find this sentence hard to understand, can it be made or followed with something more concrete:

"We also intentionally choose to use authentic data science reproducibility tools to provide opportunities for practicing and gaining confidence with these tools in "sharp" authentic scenarios rather than just in constructed exercises."

(and sorry if I wrote that - if so, let's just delete it as I no longer can unpack it in my own brain)

ttimbers commented 2 years ago

OK @joelostblom - I have read through the manuscript and provided all the edits I have at this point. Happy for you to finish off your bits and we can merge this.

joelostblom commented 2 years ago

@ttimbers Done reviewing your changes (looks great!) and fixing some typos and adding refs. There was only a small minor rewrite mentioning active learning at the beginning of guided instruction so maybe review the latest two commits..

"We also intentionally choose to use authentic data science reproducibility tools to provide opportunities for practicing and gaining confidence with these tools in "sharp" authentic scenarios rather than just in constructed exercises."

I don't remember who wrote that paragraph originally either, haha! But I tried rewriting that sentence and the next to the following to indicate that we use "real-world" tools. Feel free to keep or delete as you see fit:

The tools we chose to teach are data science reproducibility software that students are likely to employ in workflows in their future work. This sustained practice not only enforces students' habits, but also increases their proficiency "real-world" reproducibility software and they can run into problems in an environment where they can easily reach out for help without feeling intimidated to ask.

I will go through your other PR comments later tonight or tomorrow morning, on the top of my head I don't have a story from the trenches (but should be able to think of one, the figure creation you mentioned is similar to the most common scenarios for me I think). Also not sure where to put that paragraph but will look more closely.

joelostblom commented 2 years ago

I am going to merge this and do the formatting PR separately