fritzsedlazeck / SURVIVOR

Toolset for SV simulation, comparison and filtering
MIT License
354 stars 47 forks source link

merge vcf: some entries failed to merge #30

Closed arkyl closed 6 years ago

arkyl commented 6 years ago

Hi Fritz, We sequenced a NA12878 run and used parliament2 to generate SVs. We tried to compare the parliament2 (>500bp) DEL results with Personalis' DELs, using "SURVIVOR merge", with parameters: 1000 2 1 1 0 500. We used the parliament2 combined vcf as well as the MANTA calls from the combined vcf (simply by grep MANTA). Here is what confused us: We got more merged calls from the MANTA subset than the combined vcf. An example:

Personalis: chr1 59582965 DEL00BED N . LowQual IMPRECISE;SVTYPE=DEL;SVMETHOD=BEDFILE;CHR2=chr1;END=59583989;SVLEN=1024;PE=1 GT:GL:GQ:FT:RC:DR:DV:RR:RV

Combined vcf: chr1 59583072 DEL0086690SUR N . PASS SUPP=2;AVGLEN=971.5;SVTYPE=DEL;SVMETHOD=SURVIVORv2;CHR2=1;END=59583990;STRANDS=+-;CALLERS=BREAKDANCER,LUMPY GT:SP 1/1:BREAKDANCER,LUMPY chr1 59583302 DEL0086689SUR N . PASS SUPP=2;AVGLEN=687;SVTYPE=DEL;SVMETHOD=SURVIVORv2;CHR2=1;END=59583989;STRANDS=+-;CALLERS=MANTA GT:SP 1/1:MANTA

MANTA subset: chr1 59583302 DEL0086689SUR N . PASS SUPP=2;AVGLEN=687;SVTYPE=DEL;SVMETHOD=SURVIVORv2;CHR2=1;END=59583989;STRANDS=+-;CALLERS=MANTA GT:SP 1/1:MANTA

In the above example, the combined vcf failed to merge the DEL with the Personalis' but the MANTA subset merged the DEL successfully.

Suspecting that the problem might be due to the overlapping calls in the combined vcf, we merged combined vcf against itself. In the above example, the two DELs from combined vcf indeed merged into one. However, the DEL from self merged combine-vcf still fail to merge with Personalis'. chr1 59583072 DEL0087SUR N . PASS SUPP=2;SUPP_VEC=11;AVGLEN=971;SVTYPE=DEL;SVMETHOD=SURVIVORv2;CHR2=1;END=59583990;CIPOS=0,230;CIEND=-1,0;STRANDS=+- GT:PSV:LN:DR:ST:TY:CO 1/1:2,:971:0,0:+-:DEL,DEL:chr1_59583072-1_59583990,chr1_59583302-1_59583989 1/1:2,:971:0,0:+-:DEL,DEL:chr1_59583072-1_59583990,chr1_59583302-1_59583989 The above call still failed to merge with Personalis.

Your thought on this is greatly appreciated. Thanks much.

fritzsedlazeck commented 6 years ago

Hey thanks for reaching out. Please check if the vcf files are sorted. I realized recently that there are sometimes a problem if that is not the case and multiple SV entries are close to each other or on top of each other.

The other thing that can be is that you require that SV are only merged if they agree on the strand that they are reported on. This is not reported in the first entry, but should be set automatically to +- as the manta entries show.

If this does not resolve the issue could you please provide a subset of these variations such that I can have a look at this.

Thanks Fritz

arkyl commented 6 years ago

Hi Fritz, Thanks for replying so quickly. I found out a get-around of the problem. I can convert the vcf to bed using a simple perl script and then convert the bed back to vcf using SURVIVOR bedtovcf. This solves the problem and it indicates that the combined vcf , generated from DNAnexus Parliament2, may have some features that cause the problem.

I attach four files: new_test_NA12878-grep-MANTA.example.vcf.txt new_test_Personalis.example.vcf.txt

we tried to merge the above two files with "1000 2 1 1 0 500". The resulting merged file

new_test.merge.txt, only had one SV but failed to merge two other SVs.

after converting the vcf to bed and bed back to vcf, the resulting merged file

new_test.convert-bed-vcf.merge.txt had all the three merged SVs.

Hope this will help. Thanks much!

Best, arkyl

new_test_NA12878-grep-MANTA.example.vcf.txt new_test_Personalis.example.vcf.txt new_test.convert-bed-vcf.merge.txt new_test.merge.txt

fritzsedlazeck commented 6 years ago

Thanks. Sorry about this. I will try to resolve this. As you know VCF files are standard but very variable and thus hard to parse for each caller... Thanks Fritz

arkyl commented 6 years ago

It's a very nice software. Straightforward and easy to understand. Hope there will be more.

fritzsedlazeck commented 6 years ago

thx!