Illumina / strelka

Strelka2 germline and somatic small variant caller
GNU General Public License v3.0
355 stars 102 forks source link

Parsing read depth and allele frequency from Strelka VCF #205

Open Wangxin555 opened 2 years ago

Wangxin555 commented 2 years ago

Hi,

I am using Strelka 2.9.10 for tumor variant calling (both indel and SNV) and trying obtaining tumorSampleReadDepth, normalSampleReadDepth, tumorSampleAltCount, normalSampleAltCount, tumorSampleRefCount, normalSampleRefCount and VAF.

I got the calculation for VAF from here, but I am still confused about parsing fields like DP and AD.

For example, this is the FORMAT field of a line from the indel VCF: DP:DP2:TAR:TIR:TOR:DP50:FDP50:SUBDP50:BCN50 228:228:100,103:82,93:49,36:271.67:1.09:0.00:0.00 167:167:1,1:121,127:51,46:212.24:1.35:0.00:0.00. Because the link above suggests to measure allele frequency using tier1 info from TAR and TIR, does it mean that the actual tumor sample read depth should be the denominator (First comma-delimited value from FORMAT/TAR + First comma-delimited value from FORMAT/TIR in the TUMOR column)? Or the value in DP field is the actual read depth?

Thanks!