fritzsedlazeck / SURVIVOR

Toolset for SV simulation, comparison and filtering
MIT License
355 stars 47 forks source link

SURVIVOR filtering variants during merge as being supported by zero callers #112

Open oneillkza opened 4 years ago

oneillkza commented 4 years ago

Hi there,

I'm trying to use SURVIVOR to merge matching tumour and normal vcfs generated by sniffles from PromethION data. However, this seems to be erroneously losing variants during the merge. I've isolated one particular variant, which has extensive support in the tumour (9 reads), and which we know from previous work is a real variant in this cell line.

Minimal tumour.vcf (minus most of the header):

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  20190816_COLO829.fastq.mm2.sorted.bam
1   207981231   1293    N   <DEL>   .   PASS    PRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=1;END=208014820;STD_quant_start=2.420153;STD_quant_stop=3.000000;Kurtosis_quant_start=-1.580012;Kurtosis_quant_stop=-0.370370;SVTYPE=DEL;RNAMES=3463dfad-992e-4068-891c-22215f043d06,5cbaae2b-62dc-45ab-a993-daaacf350847,a30fb0cc-bca6-4239-a926-9d3c954e4cc2,afdfd7be-0651-4099-880c-4385ceacd6af,cffe5e9c-5783-427c-a3ad-32728023be77,ecc28152-3598-4f18-8085-3bd146b919d7,f4c7b59b-2898-401b-8b5e-4a4924cb7bcd;SUPTYPE=SR;SVLEN=-33589;STRANDS=+-;RE=7;REF_strand=9,8;AF=0.291667 GT:DR:DV    0/0:17:7

Minimal normal.vcf:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  20190816_COLO829_BL.fastq.mm2.sorted.bam
12  95244024    13927_0 GATCTTATAACTAGAAAAACCTAAAGACTCCACCAAAAAACTCTTAGATCTGATAAATAAATTCAGTAAAGCTTCAGGTACAAAATCAACACACAAAAATCGGTAGCATTTCTATACACCAATAATGAACTTGCTGAGAAAGAAATCAAGAAGGCAATCCCATTTACAATAGCTATAAAAAATAGAATATCTAACAATAAATTTAACCAAGGAGGTTGTCTTAGTCCATTTGTGTAGCTACATCTGAGGCTGGGTAATCTATAAAGAAAAGAGGTTTATTTGGCTAATGGTTCTACAGGCTGTACAAGAAGCACAGCACCAATATCTGCTACTGGAGAGGGCTTCCCGGCTGCTTCTACTCATGGCAGAAGGAGAACGGGAGCTGTTGTATGCAGAGATCATATGGTGAGAGAGAGGAAGCAA N   .   PASS    PRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=12;END=95244447;STD_quant_start=0.447214;STD_quant_stop=0.000000;Kurtosis_quant_start=5.871518;Kurtosis_quant_stop=6.574891;SVTYPE=DEL;RNAMES=1b631bb1-ac12-40c8-b950-34368382ed47,1c541b4f-cd09-4d46-b226-5701711f0bbc,20bca30e-af02-40cc-a723-3e0bff5bc395,21faee21-34b9-4d58-b23c-924199c39a61,24ec0c78-fd77-4933-881a-4f1bc0351678,274a6148-f834-4ee9-b763-8171535a07ee,2ea2a668-4c94-4e95-aa0e-065e2b0f9076,41038240-1471-4c93-8a0b-c9569d185e30,69c904ed-e0cc-4013-8c95-bffe93ae1d31,6c1c8cc4-83d2-4a60-98f6-4a690b77dfc2,7f6d4019-3415-4ded-8089-d792ebc79ed5,a4a2b758-3b46-4a23-8719-40650570aedd,df9b5bc1-d5fb-4d76-bcb6-22bb3053a7ca,f48b0ecc-6a5f-4bb8-8fee-36cbc2eab577;SUPTYPE=AL;SVLEN=-423;STRANDS=+-;RE=14;REF_strand=2,3;AF=0.736842 GT:DR:DV    0/1:5:14

merged vcf (survivor_test.vcf):

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  20190816_COLO829_BL.fastq.mm2.sorted.bam20190816_COLO829.fastq.mm2.sorted.bam
1   207981231   1293    N   <DEL>   .   PASS    SUPP=0;SUPP_VEC=00;SVLEN=-33589;SVTYPE=DEL;SVMETHOD=SURVIVOR1.0.6;CHR2=1;END=208014820;CIPOS=0,0;CIEND=0,0;STRANDS=+-   GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO ./.:NaN:0:0,0:--:NaN:NaN:NaN:NAN:NAN:NAN    0/0:NA:33589:17,7:+-:.:DEL:1293:NA:NA:1_207981231-1_208014820
12  95244024    13927_0 GATCTTATAACTAGAAAAACCTAAAGACTCCACCAAAAAACTCTTAGATCTGATAAATAAATTCAGTAAAGCTTCAGGTACAAAATCAACACACAAAAATCGGTAGCATTTCTATACACCAATAATGAACTTGCTGAGAAAGAAATCAAGAAGGCAATCCCATTTACAATAGCTATAAAAAATAGAATATCTAACAATAAATTTAACCAAGGAGGTTGTCTTAGTCCATTTGTGTAGCTACATCTGAGGCTGGGTAATCTATAAAGAAAAGAGGTTTATTTGGCTAATGGTTCTACAGGCTGTACAAGAAGCACAGCACCAATATCTGCTACTGGAGAGGGCTTCCCGGCTGCTTCTACTCATGGCAGAAGGAGAACGGGAGCTGTTGTATGCAGAGATCATATGGTGAGAGAGAGGAAGCAA N   .   PASS    SUPP=1;SUPP_VEC=10;SVLEN=-423;SVTYPE=DEL;SVMETHOD=SURVIVOR1.0.6;CHR2=12;END=95244447;CIPOS=0,0;CIEND=0,0;STRANDS=+- GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO 0/1:NA:423:5,14:+-:.:DEL:13927_0:GATCTTATAACTAGAAAAACCTAAAGACTCCACCAAAAAACTCTTAGATCTGATAAATAAATTCAGTAAAGCTTCAGGTACAAAATCAACACACAAAAATCGGTAGCATTTCTATACACCAATAATGAACTTGCTGAGAAAGAAATCAAGAAGGCAATCCCATTTACAATAGCTATAAAAAATAGAATATCTAACAATAAATTTAACCAAGGAGGTTGTCTTAGTCCATTTGTGTAGCTACATCTGAGGCTGGGTAATCTATAAAGAAAAGAGGTTTATTTGGCTAATGGTTCTACAGGCTGTACAAGAAGCACAGCACCAATATCTGCTACTGGAGAGGGCTTCCCGGCTGCTTCTACTCATGGCAGAAGGAGAACGGGAGCTGTTGTATGCAGAGATCATATGGTGAGAGAGAGGAAGCAA:N:12_95244024-12_95244447  ./.:NaN:0:0,0:--:NaN:NaN:NaN:NAN:NAN:NAN

SURVIVOR command:

SURVIVOR merge survivor_mini_test.txt 1000 0 0 0 0 50 survivor_test.vcf

Note how the tumour variant has SUPP_VEC=00. I had to set the num_callers parameter to 0 to get it to be included.

oneillkza commented 4 years ago

And, now that I think about it, I see the sentence "NOTE ./. or 0/0 is not counted as supporting a variant." in the docs. Since this is a subclonal heterozygous tumour variant, the allele frequency is less than 0.5, and Sniffles called the genotype as 0/0.

It might be helpful to have something in the documentation noting that num_callers can be set to zero to include variants like these.

fritzsedlazeck commented 4 years ago

Thanks for reaching out. This is indeed a point that I am also not sure what would be the smartest way. For de novo calls like yours 0/0 could be taken into account . For force calling (genotyping of known svs) a 0/0 should not be taken into account.

I will try to highlight this better. Thanks Fritz

oneillkza commented 4 years ago

Thanks for the response! (And for providing the tool in the first place).