SciLifeLab / TIDDIT

TIDDIT - structural variant calling
Other
71 stars 13 forks source link

VCF output: mismatch between header and FORMAT fields #97

Closed VladimirRoudko closed 2 years ago

VladimirRoudko commented 2 years ago

Hello,

I am using TIDDIT as part of nf-core sarek pipeline (version 3.0.1). I noted a discrepancy between commented header description and actual FORMAT fields. Here is the commented header:

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

Here is the actual FORMAT line for each event: GT:CN:COV:DR:SR:LQ:RR:RD

As you can see,, DV and RD fields are absent from description . Also, according to the description DR should have 2 integers, however each SV event has only one:

1/1:4:83,148,244:0:3:0.0,0.0:39,223:56,285

Thank you, Vladimir

VladimirRoudko commented 2 years ago

Corrected description of issues:

  1. DV is absent from FORMAT line but present in comments
  2. RD is present in FORMAT line but absent in comments
  3. DR should have 2 integers but have only one
  4. COV: sometimes has non-integer float smaller than number of reads supporting reference allele assuming that RR (at A or B) has to be always smaller than COV (at A or B).
VladimirRoudko commented 2 years ago

just FYI: TIDDIT version 3.1.0 , command line is: tiddit --sv --bam $input --ref $fasta -o $prefix

J35P312 commented 2 years ago

Hello! And thanks for posting this issue! The RD/DR should be fixed already (https://github.com/SciLifeLab/TIDDIT/issues/91, version 3.3.1); I will take a look at the rest next week.

Have a nice weekend! //Jesper

J35P312 commented 2 years ago

Hello! Sorry for the delay! I reran Latest tiddit to have a look. In latest tiddit all those issues are solved:

GT:CN:COV:DV:RV:LQ:RR:DR 0/1:1:28,10.879166655087223,35:5:0:0.0,0.047619047619047616:10,20:46,45

There is no longer any mismatch, and both RR and DR has two values.

The COV and RR may be different, RR is the number of reads supporting the reference allele; while cov indicate the coverage within the region where the split reads and discordant pairs are located. However, most of the time, those number should be similar.

Feel free to comment if you run into any other issues!

VladimirRoudko commented 2 years ago

Thank you, that's great! Do you know, if it's possible to push updated version of TIDDIT to nf-core sarek pipeline? The current version still uses TIDDIT 3.1.0

thank you, Vladimir