Illumina / strelka

Strelka2 germline and somatic small variant caller
GNU General Public License v3.0
355 stars 102 forks source link

Variant Allele Frequency with Strelka v2.9.10 #198

Open YanaVassileva opened 3 years ago

YanaVassileva commented 3 years ago

Hey,

I know that there are a lot of questions, related to the topic, but I could not find an answer, which clarifies my confusion completely. I want to determine the tumor heterogeneity of a sample. The output vcf file was generated by Strelka v2.9.10 and a subset looks like following:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR chr1 931087 . C T . LowEVS SOMATIC;QSS=2;TQSS=2;NT=ref;QSS_NT=2;TQSS_NT=2;SGT=CC->CC;DP=82;MQ=60.00;MQ0=0;ReadPosRankSum=-0.36;SNVSB=0.00;SomaticEVS=0.63 DP:FDP:SDP:SUBDP:AU:CU:GU:TU 59:0:0:0:0,0:58,58:1,1:0,0 23:2:0:0:0,0:19,21:0,0:2,2

There is a recommendation for how to calculate the somatic allele frequency:

refCounts = Value of FORMAT column $REF + “U” (e.g. if REF="A" then use the value in FOMRAT/AU) altCounts = Value of FORMAT column $ALT + “U” (e.g. if ALT="T" then use the value in FOMRAT/TU) tier1RefCounts = First comma-delimited value from $refCounts tier1AltCounts = First comma-delimited value from $altCounts Somatic allele freqeuncy is $tier1AltCounts / ($tier1AltCounts + $tier1RefCounts)

In this case the REF is C and the ALT is T. tier1RefCount (= First comma-delimited value from $refCounts) =: 58:1 tier1AltCounts ( First comma-delimited value from $altCounts) = 2 How the somatic allele frequency would be (2/(2+60)) Is that correct? I think I am a bit confused how to interpret the value, which are separated by : multiple times as 59:0:0:0:0.

With that method I will be able to calculate the VAF per mutation. How can I estimate the overall tumor heterogeneity from the resulting VAFs?

Thank you a lot for your help. I am aware that is a newbie question and I will be very happy about any help.

Best regards Yana Vassileva Master Student "Computational Modeling and Simulation"