fritzsedlazeck / SURVIVOR

Toolset for SV simulation, comparison and filtering
MIT License
354 stars 47 forks source link

SURVIVOR bedtovcf conversion start position indexing #122

Closed Caro-Ca closed 4 years ago

Caro-Ca commented 4 years ago

Hi fritzsedlazeck, This issue is similar to #99. I recloned the code 6 hours approximately and this what I got: This is the original bed file

track name="SRR4074383.1_paired_R1_TEMP" description="SRR4074383.1_paired_R1_TEMP"
chrI    22231   22552   TY1_reference_SRR4074383.1_paired_R1_temp_nonab_1       0       +
chrI    138753  138990  TY1_reference_SRR4074383.1_paired_R1_temp_nonab_2       0       -
chrI    183135  183468  TY1_reference_SRR4074383.1_paired_R1_temp_nonab_6       0       +
chrI    189419  189754  TY2_reference_SRR4074383.1_paired_R1_temp_nonab_7       0       +

and the converted vcf file:

##fileformat=VCFv4.1
##source=SURVIVOR
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=INV,Description="Inversion">
##ALT=<ID=BND,Description="Translocation">
##ALT=<ID=INS,Description="Insertion">
##INFO=<ID=CHR2,Number=1,Type=String,Description="Chromosome for END coordinate in case of a translocation">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the structural variant">
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
##INFO=<ID=PRECISE,Number=0,Type=Flag,Description="Precise structural variation">
##INFO=<ID=SVLEN,Number=1,Type=Float,Description="Length of the SV">
##INFO=<ID=SVMETHOD,Number=1,Type=String,Description="Vector of samples supporting the SV.">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of the SV.">
##INFO=<ID=STRANDS,Number=1,Type=String,Description="Indicating the direction of the reads with respect to the type and breakpoint.">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Sample
track name="SRR4074383.1_paired_R1_TEMP" description="SRR4074383.1_paired_R1_TEMP"      3       DEL00BED        N       <DEL>   .       PASS    IMPRECISE;SVTYPE=DEL;SVMETHOD=BEDFILE;CHR2=track name="SRR4074383.1_paired_R1_TEMP" description="SRR4074383.1_paired_R1_TEMP";END=1111804544;CIPOS=0,0;CIEND=0,0;SVLEN=1111804541       GT      ./.
chrI    22230   DEL00BED        N       <DEL>   .       PASS    IMPRECISE;SVTYPE=DEL;SVMETHOD=BEDFILE;CHR2=chrI;END=22551;CIPOS=0,0;CIEND=0,0;SVLEN=321 GT      ./.
chrI    138752  DEL00BED        N       <DEL>   .       PASS    IMPRECISE;SVTYPE=DEL;SVMETHOD=BEDFILE;CHR2=chrI;END=138989;CIPOS=0,0;CIEND=0,0;SVLEN=237        GT      ./.
chrI    183134  DEL00BED        N       <DEL>   .       PASS    IMPRECISE;SVTYPE=DEL;SVMETHOD=BEDFILE;CHR2=chrI;END=183467;CIPOS=0,0;CIEND=0,0;SVLEN=333        GT      ./.

the code used: /home/silviav/SURVIVOR/Debug/SURVIVOR bedtovcf SRR4074383.1_paired_R1_temp_nonredundant.bed DEL SRR4074383.1_paired_R1_temp_nonredundant.vcf

Thank you for your time.

fritzsedlazeck commented 4 years ago

Sorry, can you be a bit more explicit what is wrong with this? Thanks Fritz

Caro-Ca commented 4 years ago

Yes, definitely. The start positions of the bed and vcf files don't match. So basically the indexing may not be assigned correctly.

fritzsedlazeck commented 4 years ago

Ah but VCF file positions are starting by 0 and bed files by 1 I think. I am also always getting confused with this. Let me know Thanks Fritz

ManavalanG commented 4 years ago

@fritzsedlazeck It's the reverse. VCF is one-indexed, whereas BED's start is zero-indexed and end is one-indexed. You might find examples presented bedops' vcf2bed helpful.

fritzsedlazeck commented 4 years ago

Sorry I always confuse these things. Drives me nuts....Let me fix that real quick

fritzsedlazeck commented 4 years ago

Ah wait these examples are only because they need to form a 1 bp interval.

fritzsedlazeck commented 4 years ago

Ok I just changed that to what this is suggesting and pushed the code. Can someone try and confirm? Thanks Fritz

Caro-Ca commented 4 years ago

Yes, it worked. Thank you so much!

QianghuiZhu commented 1 year ago

Hi! SURVIVOR is really a good tools for me to deel with SV datasets. But recently, I'm also confused with the SURVIVOR results and 0-based as well as 1-based while I uesd the convertAssemblytics module.

I found that someone has already summaried about it, like this: image

It points that sam and vcf are 1-based, while bed are 0-based. I tested the sam and bed file, like this: image

So, in my opinion, it may be that:

  1. bed to vcf: start + 1, while end=end;
  2. vcf to bed: start - 1, while end=end.

If, possible, you may update this in vcftobed, bedtovcf, and convertAssemblytics modules in the future versions. With great thanks.