fritzsedlazeck / SURVIVOR

Toolset for SV simulation, comparison and filtering
MIT License
344 stars 46 forks source link

SURVIVOR vcftobed conversion start position indexing #99

Open info-97 opened 4 years ago

info-97 commented 4 years ago

Hello fritzsedlazeck,

Thank you for this wonderful tool! I actually noticed that SURVIVOR doesn’t convert VCF to BEDPE start position correctly (i.e start position remains as 1 based rather than 0 based for a BEDPE file...Maybe this is a minor issue to fix but thought worth pointing out.

Thanks!

ManavalanG commented 4 years ago

In my testing, survivor adds one to START for some reason, instead of subtracting one. Does this have to do with strand (don't see why though)? Also, it doesn't seem to have ability to deal with confidence intervals (CIPOS, CIEND).

##fileformat=VCFv4.1
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS">
##INFO=<ID=CIEND,Number=2,Type=Integer,Description="Confidence interval around END">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample
chr1    839442  MantaDEL:123157:0:0:0:0:0   CACCTAGACACACACTCCTGGACACACACACCTAGACACACACACCTGGACAAACACA  CT  139 PASS    END=839499;SVTYPE=DEL;SVLEN=-57;CIGAR=1M1I57D   GT:FT:GQ:PL:PR:SR   0/1:PASS:86:189,0,83:1,0:8,5
chr1    1477835 MantaDEL:123205:0:1:0:1:0   AGCTGGGATTACAGGCACGCGCCACCACGCCTGGCTAATGTTGTATTTTAGTAGAGACGGGGTTTCTCCATGTTGGTCAGGCTGGTCTCTAACTCCCGACCTCAGGTGATCCACCCGCCTCGGCCTCTCAAACT  A   27  PASS    END=1477968;SVTYPE=DEL;SVLEN=-133;CIGAR=1M133D;CIPOS=0,34;HOMLEN=34;HOMSEQ=GCTGGGATTACAGGCACGCGCCACCACGCCTGGC   GT:FT:GQ:PL:PR:SR   0/1:PASS:27:77,0,324:18,0:32,3
chr1    1565630 MantaDUP:TANDEM:123223:0:1:0:0:0    G   <DUP:TANDEM>    94  PASS    END=1565727;SVTYPE=DUP;SVLEN=97;CIPOS=0,1;CIEND=0,1;HOMLEN=1;HOMSEQ=T   GT:FT:GQ:PL:PR:SR   0/1:PASS:94:144,0,772:27,0:53,9
chr1    1595101 MantaDEL:123214:0:0:0:1:0   TGACAGAGAGAGGCAGAGAGAGAGAGAGAGAGACAGACACAGAGAGAGCAGAACAGGGAGAAACAGAGAGACAGAGAGCGAGA T   238 MaxDepth    END=1595183;SVTYPE=DEL;SVLEN=-82;CIGAR=1M82D;CIPOS=0,2;HOMLEN=2;HOMSEQ=GA   GT:FT:GQ:PL:PR:SR   0/1:PASS:238:288,0,341:10,0:41,10
chr1    839443  839443  chr1    839500  839500  MantaDEL:123157:0:0:0:0:0   ,   +   -   DEL
chr1    1477836 1477836 chr1    1477969 1477969 MantaDEL:123205:0:1:0:1:0   ,   +   -   DEL
chr1    1565631 1565631 chr1    1565728 1565728 MantaDUP:TANDEM:123223:0:1:0:0:0    ,   -   +   DUP
chr1    1595102 1595102 chr1    1595184 1595184 MantaDEL:123214:0:0:0:1:0   ,   +   -   DEL
fritzsedlazeck commented 4 years ago

Dear @info-97 sorry for the delay. Dear @ManavalanG thanks for bringing this up again. I have now incorporated the -1 for the positions and included CIPOS and CIEND that wasn't previously used. I report now the start and end breakpoint from the VCF file in the later columns and the start1 + start2 are including CIPOS and the same for END1+2.

Please reclone the code and let me know if that works. Thanks Fritz

Qijie0615 commented 2 years ago

it didn't work! SURVIVOR vcftobed samples_merged_DUP.Final.vcf -99999999 99999999 samples_merged_ALL.Final.bed 1650005087(1) 1650005039