cancerit / BRASS

Breakpoints via assembly - Identifies breaks and attempts to assemble rearrangements in whole genome sequencing data.
GNU Affero General Public License v3.0
57 stars 20 forks source link

bedpe to vcf conversion is not correct #78

Closed zhujack closed 5 years ago

zhujack commented 5 years ago

Dear developper,

I found a bedpe output file does not matches the converted vcf file in the BRASS output, basically the converted vcf file contains exact double rows as in the bedpe output:

The bedpe
# chr1  start1  end1    chr2    start2  end2    id/name brass_score strand1 strand2
chr1    152080636   152080637   chr1    152081281   152081282   26  4   +   +

The vcf:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOUR
chr1    152080637   26_1    G   G[chr1:152081282[   4   .   SVTYPE=BND;MATEID=26_2;CNCH=1;OCC=1;TSRDS=NS500348:107:H35HFBGXY:2:22111:8376:13970,NS500348:107:H35HFBGXY:2:23208:2875:14633,NS500348:107:H35HFBGXY:3:11403:13788:13841,NS500348:107:H35HFBGXY:3:11608:19967:4017;BKDIST=644;SVCLASS=deletion  RC:PS   0:00    0:04
chr1    152081282   26_2    G   ]chr1:152080637]G   4   .   SVTYPE=BND;MATEID=26_1;CNCH=1;OCC=1;TSRDS=NS500348:107:H35HFBGXY:2:22111:8376:13970,NS500348:107:H35HFBGXY:2:23208:2875:14633,NS500348:107:H35HFBGXY:3:11403:13788:13841,NS500348:107:H35HFBGXY:3:11608:19967:4017;BKDIST=644;SVCLASS=deletion  RC:PS   0:00    0:04

Any suggestions? Thanks.

J.

AndyMenzies commented 5 years ago

This is in accordance with the VCF spec. There is one VCF record per breakpoint with the ID field being used to connect related breakpoints into a complete rearrangement. In this instance breakpoints 26_1 and 26_2 are the left and right hand sides of rearrangement 26.

Andy