Closed GoogleCodeExporter closed 9 years ago
Hi there,
When MOSAIK imports in files using MosaikBuild, it converts all of the
qualities into
the standard phred score definition. For most data sets, this means that the
qualities are taken as is.
Illumina decided to be smart and created a new base quality definition. Above
base
quality 10 they are tangentially equivalent. However whereas the phred scale
offers
poor resolution at base qualities under 10, Illumina offers higher resolution by
offering negative base qualities - sort of a log odds approach.
In essence all bases that have a base quality of less than 10 are crap anyway,
so
it's a bit of an academic discussion.
The unaligned reads you see in MosaikAligner fastq output use the fastq
specification
developed at the Sanger which means that BQ + 33 = ASCII code for the base
quality.
e.g. to parse fastq files all you have to do is subtract 33 from each ASCII
code you
see in the base quality line in order to get all of your base qualities for
that read.
For some extra trivia, when Illumina creates fastq files in the Gerald
directory they
use BQ + 64 = ASCII code since they use negative numbers.
Hope this helps,
// Michael
Original comment by snowneb...@gmail.com
on 16 Jan 2010 at 2:04
Thanks,
I understand those negative scores are crap but I would like to understand why
unaligned reads can not be mapped to reference genome. Is it quality issue or
sequence problem? What are they?
For the example that I posted, the postive quality scores can be transformed by
simply subtract 33 from ASCII codes but it is not just subtarct 33 from each
ASCII
code for negative quality socres becasue it is not one acsii to one number
transformation (the lengthes of sequence and quality are not match). I think I
can
write a parser based on the rule I observed but the unaligend reads output from
Mosaik alinger is not fastq format anymore.
acsii => quality score
^\ => -5
^] => -4
^^ => -3
^_ => -2
Original comment by Lo.chien...@gmail.com
on 16 Jan 2010 at 4:54
Original issue reported on code.google.com by
Lo.chien...@gmail.com
on 12 Jan 2010 at 9:01