AstraZeneca-NGS / VarDict

VarDict
MIT License
187 stars 62 forks source link

How to interpret VarDict output in vcf format #156

Open Nairobi-2020 opened 3 years ago

Nairobi-2020 commented 3 years ago

We ran VarDict on our panel sequence data, and got a question to interpret some output as vcf format.

When a variant has 'TYPE=DUP', in the 8th column, there are additional 2 values: 'SVTYPE=DUP;SVLEN=287'. What does 'SVLEN' mean? Does this mean the sequence starting from chr13:28033908 with length of 287 is duplicated as tandem?

    V1       V2 V3 V4    V5  V6   V7

3389 chr13 28033908 . C 184 PASS V8 3389 SAMPLE=Horizon-CMP001;TYPE=DUP;DP=882;VD=47;AF=0.0533;BIAS=0:2;REFBIAS=0:0;VARBIAS=29:18;PMEAN=27.9;PSTD=1;QUAL=33.3;QSTD=1;SBF=1;ODDRATIO=0;MQ=60;SN=94;HIAF=1.0000;ADJAF=0.0533;SHIFT3=0;MSI=0;MSILEN=0;NM=0.1;HICNT=47;HICOV=47;LSEQ=TTTCAGCATTTTGACGGCAA;RSEQ=GGCTTTCATACCTAAATTGC;DUPRATE=0;SVTYPE=DUP;SVLEN=287;SPLITREAD=29;SPANPAIR=18 V9 V10 3389 GT:DP:VD:AD:AF:RD:ALD 0/1:882:47:0,47:0.0533:0,0:29,18

Could any one please explain to me the meanings of: DUPRATE, SPLITREAD, SPANPAIR ?

PolinaBevad commented 3 years ago

Hello @Smurf-2020,

That is right, SVLEN is a length of structural segment in basepairs, we simply calculate it as end minus start position. On the other fields: DUPRATE - it is a duplication rate in fraction, it is calculated as ratio of number of duplicated reads to total number of reads. It will be shown only with -t/--dedup option. SPLITREAD - number of "soft clip"/split reads supporting SV. SPANPAIR - number of discordant pairs supporting SV.