Open tracykteal opened 7 years ago
It might be useful to include a component on:
This could potentially be nested under organization and/or version control - I have sometimes found it challenging to stay organized while developing code and I tend to make separate "scratch" scripts as I am writing functions that I like to save to remember everything I tried.
+1 for debugging and troubleshooting.
Unit testing might also be nested under version control or automation.
I think it'd also be useful to include some component of pushing your work out there into the world. There are a lot of repositories with neat analyses / data / etc but it's still not that useful unless other people find and use it. Maybe talking about the different avenues we have for sharing work would be useful, though it might be something that doesn't have a clear answer and is better as a general discussion or something.
also I'm +1 for data cleaning because I think it interacts nicely with data organization etc. maybe a quick intro to the concepts behind "tidy" data or something like this.
I like this list and the narrative format. As I look at it, I see some themes emerging that we can return to for the participants.
I'm sure there are others. I think these meta-topics could help provide coherence to a list of topics that could seem disconnected to some novices.
I said a lot of things in my comment in #8 that seem like they would be sub-points to these topics. I like this structure a lot so that makes me think that I'm thinking along similar lines, which is reassuring.
One thing that seems to be missing from here are methods for collaborating with others who don't want to or won't use notebooks (e.g., advisors). An example would be something like mybinder, where there is little to no setup cost for the other person to at least see what the code and results look like directly.
Related to collaboration is integrating with extant code specific to your lab's prior work & software systems that don't easily integrate with notebooks. I'm not brimming with solutions but these are definitely problems that arise, especially in interdisciplinary work.
Backing up a bit, should there be an Introduction to Reproducible Research lesson?
Here is the Intro lesson for 'R' and the formatted 'gh-pages'
For reference, here are the Reproducible Research with R lessons:
To follow up on my verbal comment - I'd like to think about integrating discussion about learner mindset with respect to potentially feeling threatened/judged when making their code or analyses available to the public at large. This could include normalizing error, building a computational identity, imposter syndrome. This wouldn't be its own half-day module, but I'd like to see how we can integrate these topics into each of the lessons and how we interact with the learners throughout this curriculum.
We'll discuss what topics to teach on the first day of the hackathon, but this is to give some more context for that discussion.
Data Carpentry workshops follow a narrative approach of how someone would go from start (getting their data back and setting up their project) through to the final output. In a regular Data Carpentry workshop, that would be a plot or figure, but here we're looking all the way through to publication of the code/notebook and data.
For instance, this is the overview of the R Reproducible Research curriculum https://github.com/datacarpentry/rr-workshop/blob/gh-pages/workshopOverview.md
Therefore the narrative components to working reproducibly with data in the Jupyter notebook could be
The last few including things like
What core topics are missing from this? What do you do in your workflow that's not included here? Any that shouldn't be included as core topics?
Since we only have two days in a workshop, we have to identify the core concepts and skills to teach. We also can identify good references or other lessons to link to for things we don't have time to discuss though.