dstreett / Super-Deduper

An application to remove PCR duplicates from high throughput sequencing runs.
11 stars 4 forks source link

Quality score range isn't correct. (Develop branch) #26

Closed samhunter closed 8 years ago

samhunter commented 8 years ago

Super-Deduper has too low a range for quality scores.

Error: Quality score is not between ascii [33,72], or [",H] Bad quality string = 3>ABAFFFFFCFGGGGGGGGGGHGHHHHHHHGHGGGGGGGHHHHHHHHFFHHHHHHHIHHHHHHHHHGGHGHFHHGGGHHHHGHHHHHHHHHGGGHHHHHHHHHHHHHHHHHHHHHHHGGHHGHHHHHGGG Bad character='I'

SuperD should support quality scores from 0 to 93 (ASCII 33 to 126) in order to be compatible with Sanger fastq format.

Also note that according to https://en.wikipedia.org/wiki/FASTQ_format qscores can apparently go above 40 even for Illumina reads "For raw reads, the range of scores will depend on the technology and the base caller used, but will typically be up to 41 for recent Illumina chemistry. Since the maximum observed quality score was previously only 40, various scripts and tools break when they encounter data with quality values larger than 40. For processed reads, scores may be even higher. For example, quality values of 45 are observed in reads from Illumina's Long Read Sequencing Service (previously Moleculo)."

dstreett commented 8 years ago

31 fixes this. Thank you ! I will also need to fix it in the others .