datacarpentry / wrangling-genomics

Data Wrangling and Processing for Genomics
https://datacarpentry.org/wrangling-genomics/
Other
71 stars 151 forks source link

Shell scripting not effectively handled #174

Open sstevens2 opened 5 years ago

sstevens2 commented 5 years ago

AZ bbq: We think the shell scripting in this lesson should be reworked.

Recommendation: A short thoroughly covered shell script combining fastqc and trimmomatic together and then using a faded example for the variant calling script instead of having them build it up themselves.

NadineBestard commented 2 years ago

I was going to open an issue addressing a similar problem: Very often we do not get the time to end the lesson, and then we can not get to the shell scripting part, that I consider to be very important. What I ended up doing in my workshops is doing the FastQC script just after running fastqc. This ensures we properly cover the scripting part (and how important it is to keep track of everything we do). The other script is only discussed if we end up having enough time, and often I simply show it directly from the lesson as we are running out of time.

In one of the carpentries discussion meetings we brought up that the lessons should be made in a way you can cut from the end without worrying about it if you run out of time. Therefore, we should bring the scripting forward in the lesson (not the whole lesson, but at least part of it, to introduce the concept)

crazyhottommy commented 2 years ago

Hi, this is Tommy and I am serving as the chair of the genomic curriculum advisor committee. The Committee met and our suggestion is to write a full script or just hand students a written script to run because you are running out of time in the end. Also gets back to the minimal path and see what can get cut. "Black box" scripts are provided, if have more time then get into it with faded examples and build as you go.

Thanks, all the maintainers for your feedback and suggestions!

sstevens2 commented 2 years ago

The suggestion I listed above was put together by a group of instructors at the AZ bug BBQ several years ago and I've been thinking about it a bit over the past 3 years of teaching this workshop.

I taught this workshop this week and want to propose an alternate solution. I think we should consider moving scripting up earlier in the data wrangling lesson and integrating it into the early episodes instead of having it as an episode at the end. Learners have been introduced to scripts in the unix shell lesson so they are already a little familiar with scripts. I think as they build up the variant calling pipeline they should build the script as they go. It might take being thoughtful commenting out sections so you don't have to rerun previous sections but would help to emphasize reproducibility through scripts earlier on.

sstevens2 commented 1 year ago

It has been a year and I taught this again and had this same issue. I think the suggestion @NadineBestard had is more or less what I was suggesting in my last comment and it is what I did this time. I actually outlined the read_qc.sh script before we started running fastqc and trimming and then once we ran them, I copied the commands back into the script. I also did this for the run_variant_calling.sh. I didn't get all my comments perfect but it was a chance to talk about that so it seemed to work well. I even got some specific feedback in my minute cards that learners liked seeing outlining the script with comments before. This also emphasizes the best practice of documenting as you go.

Note, one related issue I had with the variant calling lesson is that all of the commands are written to run from the dc_workshop folder but the final script has them cd into the results folder first. I'll search for or file a separate issue about this because I think this should be updated regardless of if we move scripting up as this issue suggests.