fritzsedlazeck / SURVIVOR

Toolset for SV simulation, comparison and filtering
MIT License
354 stars 47 forks source link

vcftobed use AVGLEN values in INFO column #37

Closed alpreyes closed 6 years ago

alpreyes commented 6 years ago

Hello,

I used the SURVIVOR vcftobed function like so (with filenames removed)

./SURVIVOR vcftobed <input.vcf> 0 -1 <output.bed>

the output.bed file reported variants as single bases which did not account for the AVGLEN values in the INFO column of the input.vcf file. Is there a way to get vcftobed to use that length information when reporting start and end values in the output bed file?

Any help is greatly appreciated. Thank you

Alberto

fritzsedlazeck commented 6 years ago

Hi Alberto, thanks for reaching out. Just some quick questions to make sure I understand.

So the output will be a bedpe file comparable to what you get form lumpy. This looks similar to this:

chr11   5200306 5200306 chr11   5208306 5208306 DUP0090SUR      ,       -       +       DUP
chr11   5200309 5200309 chr11   5200348 5200348 INV0088SUR      ,       -       -       INV

Here the first 3 columns are the first breakpoint and the 4-6 columns are the stop breakpoint. Is that what you observe? Note the SV spans the start (2nd column) and the stop (5th column).

If you only want a pure bed file you can run cut -f 1,2,4 my.bedpe > my.bed.

Let me know if that resolves it or not. Thanks Fritz

alpreyes commented 6 years ago

yes this resolved my problem. I guess the problem was I specified ".bed" as the file format as oppose to ".bedpe". But with that adjustment, it looks like the entire interval of the SV is reported in the output file. Thank you very much

fritzsedlazeck commented 6 years ago

Perfect. Sorry for the confusion.