genetronhealth / uvc

UVC, a very accurate small-variant caller (https://doi.org/10.1093/bib/bbab458)
BSD 3-Clause "New" or "Revised" License
13 stars 4 forks source link

how to understand these different values between SRR7757440_SRR7757439_TN.vcf.gz and SRR7757440_uvc1.vcf.gz #7

Closed cmguodong closed 2 years ago

cmguodong commented 2 years ago

Hi Zhao,

I have some questions about the result.

how to understand these different values between SRR7757440_SRR7757439_TN.vcf.gz [a] and SRR7757440_uvc1.vcf.gz [b], just like 'QUAL', 'FILTER', especially 'SomaticQ' [a] 3 16306504 . C T 60 PASS SOMATIC;SomaticQ=60;TLODQ=78;NLODQ=60;NLODV=A;TNBQF=251,26,0,19;TNCQF=177,48,0,62;tDP=50250;tADR=50127,78;nDP=58463;nADR=58416,6; [b] 3 16306504 . C T 49 Q50 ANY_VAR;SomaticQ=49;TLODQ=49;NLODQ=57;NLODV=;TNBQF=0,-75,0,4;TNCQF=0,-75,0,49;tDP=50250;tADR=50127,78;nDP=0;nADR=0,0;

(##INFO=<ID=SomaticQ,Number=A,Type=Float,Description="Somatic quality of the variant, the PHRED-scale probability that this variant is not somatic.) SomaticQ value of [a] means this variant is not somatic and has a high PHRED-scale probability (60,1 - 10e-6), but the value of allele ([a] tADR=50127,78 ) shows that this variant may be a low freq mutation. am I misunderstanding these?

thanks

genetronhealth commented 2 years ago

Hi @cmguodong,

What you understood is not quite correct. Higher QUAL and SomaticQ mean higher likelihood of being true somatic variants.

SomaticQ value of [a] (which is 60) means this variant is highly likely to be somatic (background noise has only 1e-6 probability of generating this variant), but the value of allele tADR=50127,78 (REF has 50127 deduplicated reads, ALT has 78 deduplicated reads) shows that this variant is a variant with low allele fraction.

We can be so confident about this mutation because 1) we have UMI (molecular barcode) information and 2) we have its matched normal which is a technical control.

QUAL and FILTER are defined in the VCF file format specifications at https://samtools.github.io/hts-specs/VCFv4.2.pdf In brief, QUAL is the quality of the variant defined in terms or calling error probability in PHRED scale, and FILTER is the string containing information which can be used to filter out false positive variants. SomaticQ is the somatic variant quality which is identical to QUAL if the calling mode is tumor-normal paired (as indicated by the SOMATIC flag in the vcf INFO).

Please note that QUAL has different meaning depending on whether germline-vs-somatic origin has to be determined. If the origin has to be determined, then QUAL is same as SomaticQ. Otherwise, QUAL is the Phred-scaled error probability that the called variant is neither germline variant nor somatic variant.

cmguodong commented 2 years ago

thank you for your detailed reply, I get a lot from your info. thanks again ^_^

genetronhealth commented 2 years ago

You are welcome!

genetronhealth commented 2 years ago

Hi @cmguodong It looks like this issue is resolved, right? If it is resolved, then I will close this issue.

genetronhealth commented 2 years ago

Looks like this issue is resolved. I will close this issue.