ErnakovichLab / dada2_ernakovichlab

Other
12 stars 4 forks source link

Filter out PolyG tails that come from NovaSeq #6

Closed hhollandmoritz closed 1 year ago

hhollandmoritz commented 2 years ago

Example of how: https://cutadapt.readthedocs.io/en/stable/recipes.html#trim-poly-a-tails

Explanation of the problem (applies to NextSeq and NovaSeq): https://www.dna-ghost.com/single-post/2018/01/23/be-careful-the-poly-g-sequence-from-nextseq-run

hhollandmoritz commented 2 years ago

Here's an example of what these look like in quality plots. fwd_qual_plotsfeb03_annot

hhollandmoritz commented 2 years ago

Solved and merged into development branch.

hhollandmoritz commented 2 years ago

@science-chump I've made updates to filter out polyGs, but haven't yet merged them into the main branch. See this code for the change.

hhollandmoritz commented 1 year ago

The default should not be 50 as it makes no sense.

From the documentation: https://cutadapt.readthedocs.io/en/stable/guide.html?highlight=nextseq#nextseq-trim 3:56 "This works like regular quality trimming (where one would use -q 20 instead), except that the qualities of G bases are ignored."

Since quality scores can't be above 37, we will need to adjust this down to something more reasonable, like 25.