AstraZeneca-NGS / VarDict

VarDict
MIT License
187 stars 62 forks source link

VCF quality measurments: Difference between MQ and QUAL #12

Closed mpschr closed 9 years ago

mpschr commented 9 years ago

Hi

I am not sure what the difference between the two fields MQ and QUAL is in the vcf output of VarDict as the explanations are very similar:

##FORMAT=<ID=MQ,Number=1,Type=Float,Description="Mean Mapping Quality">
##FORMAT=<ID=QUAL,Number=1,Type=Float,Description="Mean quality score in reads">

If the QUAL does not refer to mapping quality, to what is it refering?

Thanks for any explanations!

Update: Also, for other tools (varscan, freebayes), the field MQ is used to display the RMS Mapping quality. Is this also the case for vardict?

mjafin commented 9 years ago

Hi Michael, Thanks for the continued feedback and bug reports. The documentation is still quite sparse and we need to work on it definitely. I'll wait for Zhongwu to chime in here but in the mean time will take a look at the code to see if I can figure it out myself for you.

mjafin commented 9 years ago

I think that MQ is the mean of the mapping qualities of the reads supporting the variant and QUAL is the mean of the base (Phred) qualities of the bases for the variant.

Or something like that!

mpschr commented 9 years ago

Hi @mjafin

Thanks for your answer - your explanation certainly makes sense. Also, in this case MQ would be different to the RMS (root mean square) Mapping quality as described here: https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_annotator_RMSMappingQuality.php, right?

mjafin commented 9 years ago

Right, it looks like there is no standard deviation for mapping quality. There is one quality scores I think.

zhongwulai commented 9 years ago

Hi Michael, Miika is right.  MQ is the mapping quality, which is the fifth column in SAM record.  QUAL, meanwhile, is the base quality score, which is derived from the 11th column in SAM record.  MQ is typically an indication of how unique the region's sequence is, the higher the MQ, the more unique the sequence.  QUAL, is the sequencing quality, which can be platform biased, e.g. Ion seemed to have lower QUAL compared to Illumina. In VarDict, you can use "-O" and "-q" to control the threshold.  By default, no filtering for "-O" or mapping quality.  The default for "-q" is 25, which is more suitable for Illumina sequencing.  For Ion, you probably need to set it lower and turn off local realignment by "-k 0". Hope it helps! Zhongwu

 On Monday, August 3, 2015 6:36 AM, Miika Ahdesmaki <notifications@github.com> wrote:

I think that MQ is the mean of the mapping qualities of the reads supporting the variant and QUAL is the mean of the base (Phred) qualities of the bases for the variant.Or something like that!— Reply to this email directly or view it on GitHub.

mpschr commented 9 years ago

Hi Zhongwu,

Thanks for this explanation - that helped a lot! I will mark this issue as solved.

Best, Michael