Closed tbmorrison closed 5 years ago
Sorry about the issues here. I'm not totally sure what is going wrong and why you're seeing issues with VarDict relative to mutect2 and plain old pileup. A couple of questions/suggestions:
-f P
to bwa. Could you expand a bit on how you're mapping?conda install -c conda-forge -c bioconda vardict-java
. It would be useful to know if this helps resolve the issue compared to the perl version.Thanks for helping debug the problem.
I'm not sure what you mean by -f P to bwa. Could you expand a bit on how you're mapping? -f P ensures all output alignments are paired and mapped. I only added this when troubleshooting VarDict. I reprocessed without this tag and got same empty file.
It would be worth trying vardict-java and using bioconda I installed vardict-java using bioconda. This did not help, I still get empty file.
Share minimum input I've pasted URLs to the files below. I'm using hg38 as reference genome (if you want the reference genome files, let me know which ones in particular). https://s3.amazonaws.com/accugenomicsbucket/MIX4a.bed https://s3.amazonaws.com/accugenomicsbucket/sample4rep2_S8.sorted.corr.bam https://s3.amazonaws.com/accugenomicsbucket/sample4rep2_S8.sorted.corr.bam.bai
Thanks much for the examples and apologies for the delay in getting back with you on this. It ended up taking a bit of detective work to figure out what why VarDict didn't like calling in this region. The issue is that the reads in your variants have fairly low quality (phred20ish and lower) so end up getting screened by VarDict's read quality filter -q
, which defaults to 25. From looking at the variants, they are all clustered so my guess is that recalibration is downgrading these. If you set -q 15
to VarDict, it will output the calls you're looking at:
$ vardict-java -q 15 -b sample4rep2_S8.sorted.corr.bam -G $REF -N sample4rep2_S8 -f 0.01 -c 1 -S 2 -E 3 -g 4 MIX4a.bed
sample4rep2_S8 ccds_gn=EGFR;ccds_id=CCDS5514.1 chr7 55174723 55174724 GA TC 9 6 12 1 5 GA/TC 0.6667 2;2 34.3 1 18.6 1 60.0 12.000 0.6667 0.1111 0 1.000 1 3.8 6 9 CCTTCTCTCTCTGTCATAGG CTCTGGATCCCAGAAGGTGA chr7:55174412-55175162 Complex
sample4rep2_S8 ccds_gn=EGFR;ccds_id=CCDS5514.1 chr7 55174778 55174779 AA CT 20 7 112 0 7 AA/CT 0.3500 1;0 35.9 1 19.3 1 60.0 14.000 0.3500 0 0 2.000 1 3.0 7 20 CCCGTCGCTATCAAGGAATT GAGAAGCAACATCTCCGAAA chr7:55174412-55175162 Complex
sample4rep2_S8 ccds_gn=EGFR;ccds_id=CCDS5514.1 chr7 55174828 55174829 TC AG 15 3 012 0 3 TC/AG 0.2000 0;0 41.7 1 18.5 1 60.0 6.000 0.2000 0 0 1.000 1 3.0 3 15 GGAAATCCTCGATGTGAGTT TGCTTTGCTGTGTGGGGGTC chr7:55174412-55175162 Complex
sample4rep2_S8 ccds_gn=EGFR;ccds_id=CCDS5514.1 chr7 55181362 55181363 AC TG 4 3 01 0 3 AC/TG 0.7500 0;0 39.7 1 18.3 1 60.0 6.000 0.7500 0 0 2.000 1 3.3 3 4 TGGGCATCTGCCTCACCTCC CGTGCAGCTCATCACGCAGC chr7:55180923-55181762 Complex
sample4rep2_S8 ccds_gn=EGFR;ccds_id=CCDS5514.1 chr7 55181419 55181420 GA CT 4 2 02 0 2 GA/CT 0.5000 0;0 43.5 1 18.3 1 60.0 4.000 0.5000 0 0 2.000 1 3.0 2 4 GCCTCCTGGACTATGTCCGG ACACAAAGACAATATTGGCT chr7:55180923-55181762 Complex
sample4rep2_S8 ccds_gn=EGFR;ccds_id=CCDS5514.1 chr7 55191759 55191760 AC TG 6 3 12 1 2 AC/TG 0.5000 2;2 36.3 1 19.0 0 60.0 6.000 0.5000 0 0 2.000 1 2.3 3 6 CCGTCGCTTGGTGCACCGCG CTGGCAGCCAGGAACGTACT chr7:55191485-55192202 Complex
sample4rep2_S8 ccds_gn=EGFR;ccds_id=CCDS5514.1 chr7 55191817 55191818 TG AC 5 3 02 0 3 TG/AC 0.6000 0;0 23.7 1 16.2 1 60.0 6.000 0.6000 0 0 3.000 1 3.7 3 5 CATGTCAAGATCACAGATTT GGCTGGCCAAACTGCTGGGT chr7:55191485-55192202 Complex
sample4rep2_S8 ccds_gn=EGFR;ccds_id=CCDS5514.1 chr7 55191869 55191870 GG CT 3 2 01 0 2 GG/CT 0.6667 0;0 51.5 1 19.0 0 60.0 4.000 0.6667 0 0 2.000 1 3.5 2 3 AAGAATACCATGCAGAAGGA CAAAGTAAGGAGGTGGCTTT chr7:55191485-55192202 Complex
I also investigated this with other callers and although MuTect2 will call these reads, it filters them all due to the clustering and low base quality so on the science side, it might be worth investigating if these are real calls worth validating on. Here are the calls I got from MuTect2:
chr7 55174676 . C T . clustered_events;t_lod DP=1;ECNT=8;FS=0;MQ=60;MQ0=0;POP_AF=5e-08;P_CONTAM=0;P_GERMLINE=nan;TLOD=4.2 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:0,
1:1:1:0,0:0,1:20:0,169:60:20:0|1:55174676_C_T:0,0.99,1:0.028,0.025,0.947
chr7 55174677 . A G . base_quality;clustered_events;t_lod DP=2;ECNT=8;FS=0;MQ=60;MQ0=0;POP_AF=5e-08;P_CONTAM=0;P_GERMLINE=-1.204;TLOD=4.13 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP
_AF:SA_POST_PROB 0/1:0,1:0.75:1:0,0:0,1:19:0,169:60:21:0|1:55174676_C_T:0,0.99,1:0.028,0.025,0.947
chr7 55174723 . G T . base_quality;clustered_events ClippingRankSum=0;DP=5;ECNT=8;FS=0;MQ=60;MQ0=0;MQRankSum=0;POP_AF=5e-08;P_CONTAM=0;P_GERMLINE=-1.435;ReadPosRankSum=0.319;TLOD=14.39 GT:AD:AF:DP:F1
R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:1,4:0.799:5:1,1:0,3:19:203,166:60:26:0|1:55174723_G_T:0.778,0.788,0.8:0.026,0.027,0.947
chr7 55174724 . A C . base_quality;clustered_events ClippingRankSum=0;DP=7;ECNT=8;FS=0;MQ=60;MQ0=0;MQRankSum=0;POP_AF=5e-08;P_CONTAM=0;P_GERMLINE=-1.394;ReadPosRankSum=0;TLOD=14.33 GT:AD:AF:DP:F1
R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:1,4:0.714:5:1,1:0,3:19:203,166:60:15:0|1:55174723_G_T:0.778,0.788,0.8:0.026,0.027,0.947
chr7 55174778 . A C . base_quality;clustered_events ClippingRankSum=0;DP=13;ECNT=8;FS=6.929;MQ=60;MQ0=0;MQRankSum=0;POP_AF=5e-08;P_CONTAM=0;P_GERMLINE=-1.444;ReadPosRankSum=0.293;TLOD=14 GT:AD:AF:DP:F1
R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:9,4:0.308:13:5,1:4,3:19:156,166:60:38:0|1:55174778_A_C:0.283,0.263,0.308:0.015,0.04,0.945
chr7 55174779 . A T . clustered_events ClippingRankSum=0;DP=13;ECNT=8;FS=6.929;MQ=60;MQ0=0;MQRankSum=0;POP_AF=5e-08;P_CONTAM=0;P_GERMLINE=-1.444;ReadPosRankSum=0.293;TLOD=14 GT:AD:AF:DP:F1R2:F2R1:
MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:9,4:0.308:13:5,1:4,3:20:156,166:60:39:0|1:55174778_A_C:0.283,0.263,0.308:0.015,0.04,0.945
chr7 55174828 . T A . base_quality;clustered_events ClippingRankSum=0;DP=12;ECNT=8;FS=6.264;MQ=60;MQ0=0;MQRankSum=0;POP_AF=5e-08;P_CONTAM=0;P_GERMLINE=-1.691;ReadPosRankSum=-0.379;TLOD=7.59 GT:AD:
AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:9,3:0.25:12:5,1:4,2:18:156,163:60:21:0|1:55174828_T_A:0.232,0.192,0.25:0.014,0.04,0.946
chr7 55174829 . C G . base_quality;clustered_events ClippingRankSum=0;DP=13;ECNT=8;FS=6.264;MQ=60;MQ0=0;MQRankSum=0;POP_AF=5e-08;P_CONTAM=0;P_GERMLINE=-1.686;ReadPosRankSum=-0.379;TLOD=7.57 GT:AD:
AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:9,3:0.269:12:5,1:4,2:18:156,163:60:20:0|1:55174828_T_A:0.232,0.192,0.25:0.014,0.04,0.946
chr7 55181304 . G T . base_quality;clustered_events;t_lod ClippingRankSum=0;DP=2;ECNT=4;FS=0;MQ=60;MQ0=0;MQRankSum=0;POP_AF=5e-08;P_CONTAM=0;P_GERMLINE=-1.041;ReadPosRankSum=0;TLOD=3.72 GT:AD:AF:DP:F1
R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:1,1:0.5:2:0,0:1,1:19:118,143:60:20:0|1:55181304_G_T:0.495,0,0.5:0.025,0.027,0.948
chr7 55181305 . A C . base_quality;clustered_events;t_lod ClippingRankSum=0;DP=2;ECNT=4;FS=0;MQ=60;MQ0=0;MQRankSum=0;POP_AF=5e-08;P_CONTAM=0;P_GERMLINE=-1.041;ReadPosRankSum=0;TLOD=3.72 GT:AD:AF:DP:F1
R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:1,1:0.5:2:0,0:1,1:19:118,143:60:21:0|1:55181304_G_T:0.495,0,0.5:0.025,0.027,0.948
chr7 55181362 . A T . base_quality;clustered_events;t_lod DP=1;ECNT=4;FS=0;MQ=60;MQ0=0;POP_AF=5e-08;P_CONTAM=0;P_GERMLINE=nan;TLOD=4.2 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:0,1:1:1:0,0:0,1:18:0,143:60:49:0|1:55181362_A_T:0.99,0,1:0.025,0.028,0.947
chr7 55181363 . C G . base_quality;clustered_events;t_lod DP=1;ECNT=4;FS=0;MQ=60;MQ0=0;POP_AF=5e-08;P_CONTAM=0;P_GERMLINE=nan;TLOD=4.2 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:0,1:1:1:0,0:0,1:18:0,143:60:48:0|1:55181362_A_T:0.99,0,1:0.025,0.028,0.947
chr7 55191759 . A T . clustered_events ClippingRankSum=0;DP=5;ECNT=6;FS=3.979;MQ=60;MQ0=0;MQRankSum=0;POP_AF=5e-08;P_CONTAM=0;P_GERMLINE=-1.081;ReadPosRankSum=0.524;TLOD=7.22 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:3,2:0.4:5:2,1:1,1:20:185,179:60:20:0|1:55191759_A_T:0.374,0.364,0.4:0.02,0.032,0.948
chr7 55191760 . C G . base_quality;clustered_events ClippingRankSum=0;DP=5;ECNT=6;FS=3.979;MQ=60;MQ0=0;MQRankSum=0;POP_AF=5e-08;P_CONTAM=0;P_GERMLINE=-1.081;ReadPosRankSum=0.524;TLOD=7.22 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:3,2:0.4:5:2,1:1,1:19:185,179:60:21:0|1:55191759_A_T:0.374,0.364,0.4:0.02,0.032,0.948
chr7 55191817 . T A . base_quality;clustered_events ClippingRankSum=0;DP=3;ECNT=6;FS=0;MQ=60;MQ0=0;MQRankSum=0;POP_AF=5e-08;P_CONTAM=0;P_GERMLINE=-1.109;ReadPosRankSum=0;TLOD=5.36 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:1,2:0.666:3:1,0:0,2:18:196,187:60:21:0|1:55191817_T_A:0.646,0.646,0.667:0.028,0.024,0.947
chr7 55191818 . G C . base_quality;clustered_events ClippingRankSum=0;DP=3;ECNT=6;FS=0;MQ=60;MQ0=0;MQRankSum=0;POP_AF=5e-08;P_CONTAM=0;P_GERMLINE=-1.109;ReadPosRankSum=0;TLOD=5.36 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:1,2:0.666:3:1,0:0,2:16:196,187:60:22:0|1:55191817_T_A:0.646,0.646,0.667:0.028,0.024,0.947
chr7 55191869 . G C . base_quality;clustered_events ClippingRankSum=0;DP=3;ECNT=6;FS=0;MQ=60;MQ0=0;MQRankSum=0;POP_AF=5e-08;P_CONTAM=0;P_GERMLINE=-1.109;ReadPosRankSum=0;TLOD=7.92 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:1,2:0.667:3:1,0:0,2:19:196,187:60:21:0|1:55191869_G_C:0.646,0.646,0.667:0.028,0.024,0.947
chr7 55191870 . G T . clustered_events ClippingRankSum=0;DP=3;ECNT=6;FS=0;MQ=60;MQ0=0;MQRankSum=0;POP_AF=5e-08;P_CONTAM=0;P_GERMLINE=-1.109;ReadPosRankSum=0;TLOD=7.92 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:1,2:0.667:3:1,0:0,2:20:196,187:60:20:0|1:55191869_G_C:0.646,0.646,0.667:0.028,0.024,0.947
Thanks again for the examples and hope this helps with your use of VarDict.
Very helpful! The pipeline I'm implementing uses gatk4 BaseRecalibrator to adjust the Q scores and I did not realize how significantly the scores were adjusted.
@tbmorrison, hello! I will close this issue because it was solved by changing quality filter. Thanks for posting and thanking Brad Chapman for finding the cause of the problem!
I'm attempting to implement Vardict as part of an analysis plan for the FDA's SEQ QC targeted sequencing work group. I'm getting an empty output from PE hybrid capture sequences when using VarDict-1.5.1. I used bwa for alignment with the -f P flag to ensure outputs are paired and mapped. I used BaseRecalibrator to tune the Q scores and then sort and index with samtools. The resulting files give good VCF results from mpileup, and mutect2 VCF.
The command line below runs for a second and exists with no error or data in the output file. VarDict -b sample4rep2_S8.sorted.corr.bam -G /NGS/REFS/genome.fa -N sample4rep2_S8 -f 0.01 -v -c 1 -S 2 -E 3 -th 8 /NGS/MIX2.bed > sample4rep2_S8.vardict.txt
I can't get the Java VarDict version to install to see if an alternative version works (gradle error involving an issue with slf4j logger).
As per one suggestion in this forum, I reduced the bed file down to 3 small regions that have known VAF of >20%. Still empty output.
I've also tried removing -th 8 flag, adding -g 4 flag. I'm running Ubuntu AWS server using 16 processors x 64 Gb mem.