carpentries-incubator / snakemake-novice-bioinformatics

Introduction to Snakemake for Bioinformatics
https://carpentries-incubator.github.io/snakemake-novice-bioinformatics
Other
18 stars 9 forks source link

Several concerns on Ep 11 (new assembly workflow) #71

Open tbooth opened 2 months ago

tbooth commented 2 months ago

From review by @cmeesters:

Chapter 11 - Designing a new workflow This Chapter needs a major revision:

tbooth commented 2 months ago

Regarding the last three points, I completely agree that workflow re-use and maintainability is a vital topic and I'm currently in the process of preparing (and being funded to do so!) additional material on this topic. The whole idea of this episode is to test the skills that the learners have acquired in the previous chapters by presenting a new challenge, so I don't want to introduce any new technical concepts here.

So I'm in agreement with the reviewer that all these things are important, but I don't think they can be added to this chapter.

tbooth commented 2 months ago

genome assembly is an intricate challenge, recommending a relatively outdated tool like velvet is dangerous, as there are numerous follow-up implementation tailored for various genome types.

I totally disagree with the comment that my course is "dangerous". There is no recommendation here to use Velvet in real research. Rather, Velvet is chosen as a simple tool to illustrate an assembly-centric workflow. Likewise, "finding the longest contig" is no way to judge the quality of your assembly, but serves as a useful and easy-to-understand proxy for this exercise.

I will modify the text of the episode to make this extra clear.

tbooth commented 2 months ago

I have added clarification to the text regarding what the workflow is doing and the choice of tools. I have also added a section "Biology and bioinformatics" to instructor/prereqs.html. The main intended audience for this course would be familiar with the terms "de-novo assembly" and "contigs" and "adapters" but for those not coming from a biology background this new section provides some pointers to background reading.

the assembly part comes out of the blue and is unrelated to everything before. If you want it, you need additional material, describing the background. Best put it into a separate chapter (or several), then.

This is the entire point - to present a "whole new workflow", in order that the learners can practise applying what they now know about Snakemake to a fresh challenge. Secondarily, but no less importantly, we want learners to practise debugging (TTT notes how neglected this is in general) but it's hard to teach debugging by presenting learners with pre-written "deliberate mistakes" because they can't grasp what the intention of the broken code was in the first place or why they would make that mistake themselves. Having this chapter allows the learners to make their own mistakes and debug them.

In practical experience of teaching this, the time taken from presenting the original script to learners having their working Snakemake code is much longer than a Carpentries episode should be. Two hours is a reasonable time estimate. But I don't see how to break it down, as most of that time is spent on the extended exercise. In practise, the tutor can break up the session by sharing debugging sessions with the class as and when learners ask for help. Not only does this help people who are stuck on similar problems, but the class will engage with the process of trying to spot and rectify the problems when they see their fellow learners are stuck.

One could argue that presenting an extended exercise is "not the Carpentries way". Everything should be bite size. Everything should be done in lock step. But I just don't think the process of conceptualising, writing out, and debugging a Snakemake workflow (or learning programming in general) can be reduced like this. Learners need to get the confidence that all the tools they have been given up to this point really can be applied to a fresh problem.