Closed CharlotteAnne closed 2 years ago
I suppose very generally, a sequencing error is defined as a base that is incorrectly called as the correct base. This may be a different base entirely (if say the fluorescent signal for a different base was stronger than the 'correct' one), or a come back as N
indeed.
In terms of adapter trimming, the error is simply defined as a non-matching base. The default tolerated error rate is 0.1, so up to 10% of the adapter sequence maybe different in the actual read. Here is an example:
If we use the default Illumina adapter sequences used by Trim Galore, AGATCGGAAGAGC
, the length of the sequence is 13bp, so a 10% error rate would allow 1 mismatch in that sequence (rounded down from 1.3).
So if you had a sequence like this:
GATCGTATAGCTAGCATAGCTAGC**AGATCGGAAGAGC**
GATCGTATAGCTAGCATAGCTAGC**AGGTCGGAAGAGC**
both would get trimmed to:
GATCGTATAGCTAGCATAGCTAGC
If there was an additional mismatch in to the adapter sequence, like so:
GATCGTATAGCTAGCATAGCTAGC**GGGTCGGAAGAGC**
The sequence would not be trimmed at all (the the sequence in bold now has 2 mismatches to the adapter sequence, exceeding the 0.1 error rate. Makes sense?
Marvellous, makes perfect sense! Thank you very much.
Hi! I have a very basic question - how are you defining a sequencing error? Do you mean an N nucleotide in the sequence? Thank you for your help!