AstraZeneca-NGS / VarDict

VarDict
MIT License
186 stars 60 forks source link

Wrong depth reported in vcf file #39

Closed Khillo81 closed 5 years ago

Khillo81 commented 7 years ago

Hi!

I'm interested in using your variant caller in my pipeline and have already done a test run on targeted re sequencing data using the amplicon-based Illumina TruSight Myeloid Panel. I've compared the results with those of FreeBayes and noticed something strange: for the same variant call at a certain position, FreeBayes reports a much higher depth (DP) as compared to VarDict (e.g. 6000 compared to 40). I've checked the per-base coverage report and FreeBayes is accurate. Also, the QUAL for all bases is in a very narrow range with ~95% of the variants falling between 30 and 37. These same variants have a high quality score with FreeBayes. Are there some filtering steps that are carried out by VarDict that cause me to lose some reads?

My command line call for VarDict: vardict -G ref.fa -N sample_name -f 0.01 -b sample.bam -c 1 -S 2 -E 3 -g 4 -Q 20 -a -F 0 targets.bed | teststrandbias.R | var2vcf_valid.pl -N sample_name -E -f 0.01

I'm using BWA for mapping and then performing a base quality recalibration and local realignment using GATK before doing the variant calling. The sequencing was done on an Illumina NextSeq in paired-end mode 2x150 bp.

Thanks for your help.

mjafin commented 7 years ago

Hi @Khillo81, thanks for trying out VarDict. Can you try the same test without the -a argument? Further, what format is your bed file in? Can you paste a few lines here please

Khillo81 commented 7 years ago

@mjafin Thank you very much for your rapid response. Here are a few lines from the BED file:

1   36931667    36931937    CSF3R.exon.17.line.1.chr1.36931697.36932509_tile_1  1000    -   36931693    36931909
1   36931883    36932143    CSF3R.exon.17.line.1.chr1.36931697.36932509_tile_2  1000    +   36931910    36932117
1   36932089    36932361    CSF3R.exon.17.line.1.chr1.36931697.36932509_tile_3  1000    -   36932115    36932334
1   36932307    36932582    CSF3R.exon.17.line.1.chr1.36931697.36932509_tile_4  1000    +   36932334    36932555
1   36932800    36933050    CSF3R.exon.16.line.2.merged_with.CSF3R.exon.15.line.3.merged_with.CSF3R.exon.14.line.4.chr1.36932830.36933563_tile_1    1000    -   36932824    36933023
1   36932998    36933235    CSF3R.exon.16.line.2.merged_with.CSF3R.exon.15.line.3.merged_with.CSF3R.exon.14.line.4.chr1.36932830.36933563_tile_2    1000    +   36933025    36933208

I'll run the command without the -a and get back to you.

mjafin commented 7 years ago

The bed file looks OK to me. Can you try without the -a argument?

mjafin commented 7 years ago

Further, VarDict assigns a locus to a single amplicon. Thus if you have overlapping amplicons, it tries to choose the optimal one and therefore you will only see a fraction of the depth. It shouldn't be that drastically different though. But let us know what you see without specifying -a.

Is your problem present for most variants or just a handful? Freebayes wouldn't use information about amplicon locations so would happily count any primer sequences into the depth.

PolinaBevad commented 5 years ago

@Khillo81, hello!

I think I can close this problem because it is outdated. Please let us know if you need help with VarDict. Thank you!