Wording of aligner choice in lesson 04-variant_calling

Hi,

thanks for putting together a great resource for novices to genomics data analysis!

I would suggest changing the wording in the subsection "Align reads to reference genome". Currently it reads: "We will use the BWA-MEM algorithm, which is the latest and is generally recommended for high-quality queries as it is faster and more accurate."

I would change the focus to enabling learners to make an informed choice based on their sequencing read types and use case, instead of saying BWA-MEM is the latest, faster and more accurate (than what?). For example, minimap2 is much faster than BWA-MEM and claims to be more accurate. However, minimap2 is not suited to spliced alignment.

I would say something like this instead: "We will use the BWA-MEM algorithm, which is suited well to aligning accurate short-read transcriptomic Illumina data to genomic sequences. Alternatively, aligners such as minimap2 are well-suited for aligning noisy long-read data or short-read genomic Illumina data. The appropriate choice of aligner depending on the sequencing read types is crucial for down-stream high-quality genomic data analysis and some time should be spent choosing the best tool for the job."

Thanks, Jana

datacarpentry / wrangling-genomics

Wording of aligner choice in lesson 04-variant_calling #189