datacarpentry / wrangling-genomics

Data Wrangling and Processing for Genomics
https://datacarpentry.org/wrangling-genomics/
Other
69 stars 152 forks source link

Wording of aligner choice in lesson 04-variant_calling #189

Open JanaSperschneider opened 4 years ago

JanaSperschneider commented 4 years ago

Hi,

thanks for putting together a great resource for novices to genomics data analysis!

I would suggest changing the wording in the subsection "Align reads to reference genome". Currently it reads: "We will use the BWA-MEM algorithm, which is the latest and is generally recommended for high-quality queries as it is faster and more accurate."

I would change the focus to enabling learners to make an informed choice based on their sequencing read types and use case, instead of saying BWA-MEM is the latest, faster and more accurate (than what?). For example, minimap2 is much faster than BWA-MEM and claims to be more accurate. However, minimap2 is not suited to spliced alignment.

I would say something like this instead: "We will use the BWA-MEM algorithm, which is suited well to aligning accurate short-read transcriptomic Illumina data to genomic sequences. Alternatively, aligners such as minimap2 are well-suited for aligning noisy long-read data or short-read genomic Illumina data. The appropriate choice of aligner depending on the sequencing read types is crucial for down-stream high-quality genomic data analysis and some time should be spent choosing the best tool for the job."

Thanks, Jana

fpsom commented 4 years ago

Hi @JanaSperschneider ! Thank you very much for this comment!

I quite like the wording you propose; would you be willing to put in a PR with these changes?