carpentries-incubator / snakemake-novice-bioinformatics

Introduction to Snakemake for Bioinformatics
https://carpentries-incubator.github.io/snakemake-novice-bioinformatics
Other
18 stars 9 forks source link

Bioinformatics concepts and best practise should be more thoroughly addressed #55

Open tbooth opened 4 months ago

tbooth commented 4 months ago

Comments from @cmeesters on this topic:

tbooth commented 4 months ago

The choice of fastx and velvet is deliberate. These tools are simple, stable, and serve the purpose of the tutorial which is to show how to orchestrate commands with Snakemake.

I will, as suggested, add notes that these are not the recommended tools for real analysis work. I don't propose to comment on what is the state-of-the-art as this is beyond the scope of the lesson and introduces a further maintenance burden on the lesson maintainer (ie., me).

For the genome assembly, the only thing we need to know is that Velvet is a program that will take a bunch of short reads (in paired FASTQ files) and try to build them into long contigs (output as a FASTA file), and we are aiming to make the longest possible contig by tuning a parameter called "K". Everything else is a distraction from this defined task!

When actually teaching the course, several learners have made the same points given above and asked to go into more detail of the assembly process, or other bioinformatics topics. But this is not the place to learn the "intricate challenge" of actual genome assembly and if any learner starts thinking it is then we are in trouble. I will add instructor notes that this should be clearly emphasised.

tbooth commented 4 months ago

Also:

Kallisto performs (wording according to docs) a "pseuoalignment" - it is not a classical aligner and should not be mentioned as such.

Indeed - I'll correct this.