Reproducible-Science-Curriculum / RR-Jupyter-Hackathon-Jan-2017

Curriculum Development Hackathon on Reproducible Research using Jupyter Notebooks, to be held Jan 9-11 at BIDS in Berkeley, CA
Creative Commons Zero v1.0 Universal
24 stars 3 forks source link

Topics to teach #15

Open tracykteal opened 7 years ago

tracykteal commented 7 years ago

We'll discuss what topics to teach on the first day of the hackathon, but this is to give some more context for that discussion.

Data Carpentry workshops follow a narrative approach of how someone would go from start (getting their data back and setting up their project) through to the final output. In a regular Data Carpentry workshop, that would be a plot or figure, but here we're looking all the way through to publication of the code/notebook and data.

For instance, this is the overview of the R Reproducible Research curriculum https://github.com/datacarpentry/rr-workshop/blob/gh-pages/workshopOverview.md

Therefore the narrative components to working reproducibly with data in the Jupyter notebook could be

The last few including things like

What core topics are missing from this? What do you do in your workflow that's not included here? Any that shouldn't be included as core topics?

Since we only have two days in a workshop, we have to identify the core concepts and skills to teach. We also can identify good references or other lessons to link to for things we don't have time to discuss though.

bridgethass commented 7 years ago

It might be useful to include a component on:

This could potentially be nested under organization and/or version control - I have sometimes found it challenging to stay organized while developing code and I tend to make separate "scratch" scripts as I am writing functions that I like to save to remember everything I tried.

kellieotto commented 7 years ago

+1 for debugging and troubleshooting.

Unit testing might also be nested under version control or automation.

choldgraf commented 7 years ago

I think it'd also be useful to include some component of pushing your work out there into the world. There are a lot of repositories with neat analyses / data / etc but it's still not that useful unless other people find and use it. Maybe talking about the different avenues we have for sharing work would be useful, though it might be something that doesn't have a clear answer and is better as a general discussion or something.

choldgraf commented 7 years ago

also I'm +1 for data cleaning because I think it interacts nicely with data organization etc. maybe a quick intro to the concepts behind "tidy" data or something like this.

dsoto commented 7 years ago

I like this list and the narrative format. As I look at it, I see some themes emerging that we can return to for the participants.

I'm sure there are others. I think these meta-topics could help provide coherence to a list of topics that could seem disconnected to some novices.

mpacer commented 7 years ago

I said a lot of things in my comment in #8 that seem like they would be sub-points to these topics. I like this structure a lot so that makes me think that I'm thinking along similar lines, which is reassuring.

One thing that seems to be missing from here are methods for collaborating with others who don't want to or won't use notebooks (e.g., advisors). An example would be something like mybinder, where there is little to no setup cost for the other person to at least see what the code and results look like directly.

Related to collaboration is integrating with extant code specific to your lab's prior work & software systems that don't easily integrate with notebooks. I'm not brimming with solutions but these are definitely problems that arise, especially in interdisciplinary work.

butterflyology commented 7 years ago

Backing up a bit, should there be an Introduction to Reproducible Research lesson?

Here is the Intro lesson for 'R' and the formatted 'gh-pages'

For reference, here are the Reproducible Research with R lessons:

ErinBecker commented 7 years ago

To follow up on my verbal comment - I'd like to think about integrating discussion about learner mindset with respect to potentially feeling threatened/judged when making their code or analyses available to the public at large. This could include normalizing error, building a computational identity, imposter syndrome. This wouldn't be its own half-day module, but I'd like to see how we can integrate these topics into each of the lessons and how we interact with the learners throughout this curriculum.