fritzsedlazeck / Sniffles

Structural variation caller using third generation sequencing
Other
557 stars 92 forks source link

variant phasing with --cluster option #120

Closed JYLeeBioinfo closed 5 years ago

JYLeeBioinfo commented 5 years ago

Hello, @fritzsedlazeck

I'm having trouble in using --cluster option of the sniffles

I can't find the variant phasing information in VCF files and BEDPE files

the command I used is as follows

sniffles -t 20 -m input.bam -v output.vcf --max_distance 100000 --genotype --cluster --report_seq --report_read_strands

I compared output VCF file with and without cluster option but the output file was exactly the same.

As far as I know, slashes(/) are used in unphased genotypes and pipes(|) are used in phased genotypes, such as 0/1 and 0|1 .

And the genotypes were all in an unphased format.

[hd00ljy@master genotype-cluster.vcf]$ grep -v "^#" output.vcf | cut -f 9-10 | head
GT:DR:DV        0/1:10:12
GT:DR:DV        0/1:11:13
GT:DR:DV        0/1:6:13
GT:DR:DV        0/1:12:10
GT:DR:DV        0/1:8:10
GT:DR:DV        0/1:8:10
GT:DR:DV        0/1:9:10
GT:DR:DV        0/1:8:10
GT:DR:DV        0/1:9:10
GT:DR:DV        0/1:9:10
[hd00ljy@master 100000bp-dist.vcf]$ grep "|" output.vcf
[hd00ljy@master 100000bp-dist.vcf]$

Am I missing something?? If so, it would be really helpful if you explain to me where to find and how to interpret the variant phase information.

Thank you

fritzsedlazeck commented 5 years ago

Thanks for bring this up. Back when I implemented that mode I was not really aware of the standard to indicate the phasing in the GT column. Thus, I indicated it by the variant ID. So two events are on the same haplotype if they share the same variant ID. They are still unique IDs since I attach a _1 or _2 and so on.

I will reassign this to an enhancement. Thanks Fritz

JYLeeBioinfo commented 5 years ago

Thank you for your reply and sorry for my late response.

I tried with new data with longer read lengths and got the SV IDs with _0, _1, _2 and so on.

However, I found that duplicate IDs exist for ~30 phased events. (not for unphased SVs)

How should I interpret those events?

The sniffles command I used is as follows

sniffles --report_BND -s 8 -n -1 -t 80 -m $BAM_filt -v test.vcf --max_distance 1000 --minmapping_qual 30  --genotype --cluster  --report_seq --report_read_strands 

Here are some examples I found. As of the first example, I could not find the 14647_0 event

chr21   10692365        14647_1 CCATTCCATTCCATTCCATTCCATTCCATTCCATTC    N       .       PASS    PRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=chr21;END=10692401;STD_quant_start=3.193744;STD_quant_stop=3.962323;Kurtosis_quant_start=-0.415687;Kurtosis_quant_stop=0.085988;SVTYPE=DEL;RNAMES=1bae3244-8d9b-4956-bc62-e457ed383d7b,1cd2fe57-9e04-4489-b748-6af2c1c74c55,2626cb93-345e-4b8d-8b93-794616c97dbd,42a8d08c-98eb-41c2-9c16-a9a913bfd196,50408290-4af8-4060-9b8f-a52abb1d7678,56cb85df-b2d2-42e1-b93b-db5583cbe919,611db715-60cf-4c72-afab-0aa844386f0e,801b5734-28ca-47e9-9040-61933de402fb,8df87b11-5e8b-4fec-bc33-58f0a3ff919f,91d54902-48ac-4aa1-8bfa-a3723713034f,a1caa5ff-31a0-42e7-9e39-2e0d45aa98b2,a59019dd-6114-46f6-a7d8-bad9338f2332,c49a7b74-07d8-4bd0-9b5e-b94a33a31dd0,d429fa79-e8be-405f-9968-48463d1e8b3d,dbd5e813-dc7b-48ba-9ce0-c4cfcf2d91e8;SUPTYPE=AL;SVLEN=-36;STRANDS=+-;STRANDS2=15,0,15,0;RE=15;REF_strand=0,0;AF=1        GT:DR:DV        1/1:0:15
chr21   10692670        14647_1 CCGTTCCATTCCATTCCATTCCAGTCCATTCCACTGGAGTCCATTCCATTCCATTCCATTCCATTCCATTCCATTGCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTGCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTGCATTCCAT       N       .       PASS    IMPRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=chr21;END=10692881;STD_quant_start=5.686241;STD_quant_stop=27.010800;Kurtosis_quant_start=1.941410;Kurtosis_quant_stop=-0.302489;SVTYPE=DEL;RNAMES=1bae3244-8d9b-4956-bc62-e457ed383d7b,1cd2fe57-9e04-4489-b748-6af2c1c74c55,2626cb93-345e-4b8d-8b93-794616c97dbd,365e99f5-7409-4801-bad4-b4a6373ed3e7,42a8d08c-98eb-41c2-9c16-a9a913bfd196,452195e3-e47e-454c-a349-a4bc524a7342,4d09d086-686d-438b-85a7-223c2ff01d4e,50408290-4af8-4060-9b8f-a52abb1d7678,56cb85df-b2d2-42e1-b93b-db5583cbe919,611db715-60cf-4c72-afab-0aa844386f0e,801b5734-28ca-47e9-9040-61933de402fb,86326ef1-abc6-4643-9739-078a61279f27,91d54902-48ac-4aa1-8bfa-a3723713034f,a1caa5ff-31a0-42e7-9e39-2e0d45aa98b2,a5691e42-e7b0-4308-8c8a-425a113b2f35,a59019dd-6114-46f6-a7d8-bad9338f2332,b6bd2e74-6b09-4199-aca3-1141197c4ec1,b9736eb6-602b-4c48-ac2f-5609bf85a885,c49a7b74-07d8-4bd0-9b5e-b94a33a31dd0,d429fa79-e8be-405f-9968-48463d1e8b3d,e1598512-5f87-4d8a-99cc-0d9ddb217a87,e276f1ba-433c-4674-b547-0fe0a9ba5360,ec9de94f-ba40-480e-9dc1-373f3a58ef6a,feda2f3f-41eb-41f5-9f5a-1e2fa4197dbf;SUPTYPE=AL;SVLEN=-211;STRANDS=+-;STRANDS2=23,1,23,1;RE=24;REF_strand=0,0;AF=1 GT:DR:DV        1/1:0:24
chr21   10693543        14647_2 TGCATTCCATTCCATTCCATTCCATTCCATTC        N       .       PASS    IMPRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=chr21;END=10693575;STD_quant_start=14.866069;STD_quant_stop=12.489996;Kurtosis_quant_start=0.091876;Kurtosis_quant_stop=0.790872;SVTYPE=DEL;RNAMES=452195e3-e47e-454c-a349-a4bc524a7342,56cb85df-b2d2-42e1-b93b-db5583cbe919,86326ef1-abc6-4643-9739-078a61279f27,91d54902-48ac-4aa1-8bfa-a3723713034f,a1caa5ff-31a0-42e7-9e39-2e0d45aa98b2,a5691e42-e7b0-4308-8c8a-425a113b2f35,b9736eb6-602b-4c48-ac2f-5609bf85a885,d429fa79-e8be-405f-9968-48463d1e8b3d,ec9de94f-ba40-480e-9dc1-373f3a58ef6a;SUPTYPE=AL;SVLEN=-32;STRANDS=+-;STRANDS2=9,0,9,0;RE=9;REF_strand=0,0;AF=1 GT:DR:DV        1/1:0:9
chr21   10698432        14647_3 TTCCATTCGGTTCCATTCCCTTTCATTCCATTTGAGTGCATTCCATTCCATTCCA N       .       PASS    PRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=chr21;END=10698487;STD_quant_start=7.648529;STD_quant_stop=7.648529;Kurtosis_quant_start=2.384129;Kurtosis_quant_stop=2.502626;SVTYPE=DEL;RNAMES=1bae3244-8d9b-4956-bc62-e457ed383d7b,2626cb93-345e-4b8d-8b93-794616c97dbd,7301b428-1e58-4ab0-8ccd-99e4c8308538,7edb1f48-a6de-49a1-9922-4ab0f0a9aba6,86326ef1-abc6-4643-9739-078a61279f27,91d54902-48ac-4aa1-8bfa-a3723713034f,b38432c8-d49d-43fc-b473-b41d214c51fb,b4fe543d-6428-4813-a3d4-f64c9ef7924b,b9736eb6-602b-4c48-ac2f-5609bf85a885,d429fa79-e8be-405f-9968-48463d1e8b3d,efb464a0-41e7-466e-8b0b-7c16f4a6022c,f782eb3d-452c-47ad-9422-1a99307653a0;SUPTYPE=AL;SVLEN=-55;STRANDS=+-;STRANDS2=11,1,11,1;RE=12;REF_strand=6,0;AF=0.666667 GT:DR:DV        0/1:6:12
chr17_GL000205v2_random 10457   15771_0 N       GGAATCCTGAGGACAAACTTCAGAACCTCTTGGTGTTCTGGAAGTATGTGAGGACACACACTCAGACCACCCTGCATGGTGATCTGGGAATCCTATGTGAGGACAAACACTCAGAACTGGCAAGTGTTT       .       PASS    IMPRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=chr17_GL000205v2_random;END=10535;STD_quant_start=25.865034;STD_quant_stop=33.962053;Kurtosis_quant_start=-1.739023;Kurtosis_quant_stop=-1.970055;SVTYPE=INS;RNAMES=05063a45-faa9-4bf4-a645-f507a6e36c3a,0a1ac840-c184-4a69-8b38-b37d7a029860,12ce8e85-285f-4403-8fe1-00cf7c0d7556,1407abdf-75f0-43e0-9f08-5a75a361b5da,162b1e0a-be13-415d-b81d-5cbfa772a2ff,2dcc4af7-cd24-48cb-b558-ada1ea8d6ac0,2ff8db9e-196c-439e-96c4-bf2473b2a7eb,38a77f09-6542-4873-af1a-c21257d4e7f1,3b0c4ff0-02e6-4b1c-a4aa-034ee8652b6e,408a09e3-b964-411e-b610-fe9f0d36ee30,47148fbe-9cf0-4790-9267-717e3d2737ca,483315fc-b766-4882-9ee0-66c20bead3cd,547602ae-a9e0-448d-974e-11bf2cef8388,550d3f89-6ee1-4a99-bf58-dea799ce6bd0,5c7f3e80-aed4-4cce-9270-b12781cb38a5,658d4e03-3722-43d2-b521-11819398d728,6a7a2b71-1686-48fa-a948-b8dfe99120b9,71f1af9d-6c80-4b93-8218-5ca29e191206,7344e588-aab9-42b0-bf9e-541a26e0e87f,7cfef975-ea51-455c-af17-26cc31457f1f,814eb71b-e57a-4d8f-9ec4-7310dc04ec85,95151ad8-3044-4b00-9105-ad7212872f13,a47174ea-2040-44d2-8ce6-d026d567ed5a,a7b579be-2d5b-4da0-bdcd-3000373e407c,a7fa006f-8ec9-4265-8a56-bcbfc089f7ef,ad31ca45-4167-4868-af75-3d4121a6d961,b2b85933-489c-4d22-bbe9-a860ed13ea87,b7c391fc-c7f6-42fe-bd5f-d85a80e4cb0d,bf3e6968-fbcb-40a7-9599-70a3bca67680,c081090a-32ee-4a62-b6f0-27a788e3cc9c,ccf1c4de-5345-4c05-b705-1af342fffd82,cf5ced83-3913-4de0-be32-760ced460282,cf9a6ad0-62df-4cb7-84cc-55443981aa6a,cfa3c1f6-1fd4-4c0d-ab4f-157065a8bf24,de265d79-2b6f-46a9-9717-ba32a43c43b2,e45480ff-9537-411f-8994-6bb486ab98bc,e4b1f83c-8c0b-423e-9429-9e38c3f68031,f21bc955-308e-4923-8a9b-7305d810e6ff,f23b109f-684a-4ba8-b67d-59cd4938e42c,fd485805-2b49-46e3-9da8-9c1d58169969;SUPTYPE=AL;SVLEN=138;STRANDS=+-;STRANDS2=20,20,20,20;RE=38;REF_strand=17,17;AF=0.527778     GT:DR:DV        0/1:34:38
chr17_GL000205v2_random 11421   15771_1 N       GTGTGTGAGGACAAAGACCAGACCCTGGTAGAAGTGGTACCTAAATCCT       .       PASS    IMPRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=chr17_GL000205v2_random;END=11465;STD_quant_start=13.419274;STD_quant_stop=13.733732;Kurtosis_quant_start=-0.640978;Kurtosis_quant_stop=-0.347896;SVTYPE=INS;RNAMES=12ce8e85-285f-4403-8fe1-00cf7c0d7556,162b1e0a-be13-415d-b81d-5cbfa772a2ff,1922d7e7-1c23-467e-bcf6-186e0a8c0cbc,1a253b33-405f-4348-8da3-12862cb5cc9c,2da442b7-275e-4c1e-b3c3-ab199539e8bb,2dcc4af7-cd24-48cb-b558-ada1ea8d6ac0,2ff8db9e-196c-439e-96c4-bf2473b2a7eb,38a77f09-6542-4873-af1a-c21257d4e7f1,3b0c4ff0-02e6-4b1c-a4aa-034ee8652b6e,408a09e3-b964-411e-b610-fe9f0d36ee30,483315fc-b766-4882-9ee0-66c20bead3cd,4c6663eb-9ec5-4c6d-b83d-d38fa8ebf1e6,550d3f89-6ee1-4a99-bf58-dea799ce6bd0,5b40466e-c2f4-4eab-bb2f-756b3ac5dec2,5c7f3e80-aed4-4cce-9270-b12781cb38a5,6a7a2b71-1686-48fa-a948-b8dfe99120b9,7344e588-aab9-42b0-bf9e-541a26e0e87f,87f64880-49a1-49cc-84c8-917d5230eba1,95151ad8-3044-4b00-9105-ad7212872f13,a7fa006f-8ec9-4265-8a56-bcbfc089f7ef,ad31ca45-4167-4868-af75-3d4121a6d961,b2b85933-489c-4d22-bbe9-a860ed13ea87,b7c391fc-c7f6-42fe-bd5f-d85a80e4cb0d,e45480ff-9537-411f-8994-6bb486ab98bc,e4b1f83c-8c0b-423e-9429-9e38c3f68031,fd485805-2b49-46e3-9da8-9c1d58169969;SUPTYPE=AL;SVLEN=43;STRANDS=+-;STRANDS2=12,14,12,14;RE=26;REF_strand=3,2;AF=0.83871        GT:DR:DV        1/1:5:26
chr17_GL000205v2_random 12429   15771_2 TATGTGAGGGAGAAACATTCAGACAATCGTATCAGTGTTCCAGAATC N       .       PASS    PRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=chr17_GL000205v2_random;END=12476;STD_quant_start=3.146427;STD_quant_stop=3.114482;Kurtosis_quant_start=-0.912141;Kurtosis_quant_stop=-1.426809;SVTYPE=DEL;RNAMES=12ce8e85-285f-4403-8fe1-00cf7c0d7556,1922d7e7-1c23-467e-bcf6-186e0a8c0cbc,1a253b33-405f-4348-8da3-12862cb5cc9c,1c4ac883-fe26-4a43-bc7a-eccbc518ce1d,2da442b7-275e-4c1e-b3c3-ab199539e8bb,38a77f09-6542-4873-af1a-c21257d4e7f1,3b0c4ff0-02e6-4b1c-a4aa-034ee8652b6e,483315fc-b766-4882-9ee0-66c20bead3cd,4c6663eb-9ec5-4c6d-b83d-d38fa8ebf1e6,53a562b8-65b7-4af2-9e98-64537134a126,5c7f3e80-aed4-4cce-9270-b12781cb38a5,87f64880-49a1-49cc-84c8-917d5230eba1,95151ad8-3044-4b00-9105-ad7212872f13,a7fa006f-8ec9-4265-8a56-bcbfc089f7ef,b7c391fc-c7f6-42fe-bd5f-d85a80e4cb0d,d5f9abca-505d-44b8-bb58-08078498e0c8,e45480ff-9537-411f-8994-6bb486ab98bc,e4b1f83c-8c0b-423e-9429-9e38c3f68031,fd485805-2b49-46e3-9da8-9c1d58169969;SUPTYPE=AL;SVLEN=-47;STRANDS=+-;STRANDS2=10,9,10,9;RE=19;REF_strand=1,1;AF=0.904762  GT:DR:DV        1/1:2:19
chr17_GL000205v2_random 12955   15771_3 TTCTGTGTGAGGGACAAACTTTCAGATGCTCGTAGCAGTGTTCTGGAACTCTGTGTGAGGGACAAACTTTCAGACCCTCGTAGCAGTGTTCTGGAA        N       .       PASS    IMPRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=chr17_GL000205v2_random;END=13021;STD_quant_start=13.330416;STD_quant_stop=16.941074;Kurtosis_quant_start=5.622000;Kurtosis_quant_stop=2.409351;SVTYPE=DEL;RNAMES=05063a45-faa9-4bf4-a645-f507a6e36c3a,0a1ac840-c184-4a69-8b38-b37d7a029860,1407abdf-75f0-43e0-9f08-5a75a361b5da,162b1e0a-be13-415d-b81d-5cbfa772a2ff,2dcc4af7-cd24-48cb-b558-ada1ea8d6ac0,2ff8db9e-196c-439e-96c4-bf2473b2a7eb,483315fc-b766-4882-9ee0-66c20bead3cd,550d3f89-6ee1-4a99-bf58-dea799ce6bd0,6a7a2b71-1686-48fa-a948-b8dfe99120b9,7344e588-aab9-42b0-bf9e-541a26e0e87f,7cfef975-ea51-455c-af17-26cc31457f1f,ad31ca45-4167-4868-af75-3d4121a6d961,b2b85933-489c-4d22-bbe9-a860ed13ea87,cf5ced83-3913-4de0-be32-760ced460282;SUPTYPE=AL;SVLEN=-66;STRANDS=+-;STRANDS2=7,7,7,7;RE=14;REF_strand=0,0;AF=1  GT:DR:DV        1/1:0:14
chr17_GL000205v2_random 15571   15771_3 N       <DEL>   .       PASS    IMPRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=chr17_GL000205v2_random;END=36711;STD_quant_start=22.939720;STD_quant_stop=9.655608;Kurtosis_quant_start=-1.408083;Kurtosis_quant_stop=1.303410;SVTYPE=DEL;RNAMES=0a1ac840-c184-4a69-8b38-b37d7a029860,0b417bae-016c-4537-b764-209e8221f136,1019827c-3baf-4bb5-8c44-1869d8cef089,1f5c8f65-4482-4012-854a-0de0021e4a64,285c484f-7c10-4d36-95bc-ae4c3c4c2f97,2ff0109e-076b-4ec8-9164-fa4ba08f285f,3077497c-24a6-4a8b-95e6-cdbe320b3f05,42f5dba5-fb22-4d06-895d-eb714191ad45,4ae63d27-a1ca-4561-888d-0a7afe24c32c,5b40466e-c2f4-4eab-bb2f-756b3ac5dec2,5d0cf2fd-c579-4f80-9b22-38f2e85c8b35,640322d2-4c98-4a87-b00b-34f1e8f97197,7344e588-aab9-42b0-bf9e-541a26e0e87f,7c8356a9-5d69-4cc9-9643-8db39e7f0e67,7cfef975-ea51-455c-af17-26cc31457f1f,8f20eeb3-df8e-4698-8d86-d769432a686a,923d544d-e617-4d80-a01d-fe05501447af,9528c511-e164-40b0-b3fd-ade853a21023,b2b85933-489c-4d22-bbe9-a860ed13ea87,bd1da397-38b0-45db-b334-eb44e2b13f9b,d1e44c88-cfbb-41ff-ae67-9804d149a634,d2b44c7c-5265-439e-b471-c0e062a0cd7a,e52dbde6-19ba-4017-9cfe-316a65a4676a,ec5fcf48-ed61-4e29-b055-a45ad189d712,efa2c436-eb07-46c8-9c89-75b0787aaf3b,ff554813-bef9-4524-bea7-14b7c46cf312;SUPTYPE=SR;SVLEN=-21140;STRANDS=+-;STRANDS2=14,12,14,12;RE=26;REF_strand=6,16;AF=0.541667    GT:DR:DV        0/1:22:26
chr17_GL000205v2_random 15736   15771_3 N       ATGCCAAACATTTGTAACCCAGTAGCAGCGCTCTGGAATCCCAAGTGAGGA     .       PASS    IMPRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=chr17_GL000205v2_random;END=15778;STD_quant_start=11.691878;STD_quant_stop=9.823441;Kurtosis_quant_start=-1.110318;Kurtosis_quant_stop=-0.924401;SVTYPE=INS;RNAMES=12ce8e85-285f-4403-8fe1-00cf7c0d7556,1922d7e7-1c23-467e-bcf6-186e0a8c0cbc,1a253b33-405f-4348-8da3-12862cb5cc9c,1c4ac883-fe26-4a43-bc7a-eccbc518ce1d,2da442b7-275e-4c1e-b3c3-ab199539e8bb,483315fc-b766-4882-9ee0-66c20bead3cd,4c6663eb-9ec5-4c6d-b83d-d38fa8ebf1e6,53a562b8-65b7-4af2-9e98-64537134a126,5c7f3e80-aed4-4cce-9270-b12781cb38a5,87f64880-49a1-49cc-84c8-917d5230eba1,a7fa006f-8ec9-4265-8a56-bcbfc089f7ef,b7c391fc-c7f6-42fe-bd5f-d85a80e4cb0d,ce039ff5-418a-47e8-a8d5-efbaa0c3057b,d5f9abca-505d-44b8-bb58-08078498e0c8,e45480ff-9537-411f-8994-6bb486ab98bc;SUPTYPE=AL;SVLEN=44;STRANDS=+-;STRANDS2=9,6,9,6;RE=15;REF_strand=0,0;AF=1     GT:DR:DV        1/1:0:15
chr17_GL000205v2_random 19669   15771_4 TCAACCATTCAGACAACAGCAGTAGTGTTCTGCAAGCCTATAAGAGGGAAAAACATTCAGACAACAGCAGGAGTGATCTGGAATCCCATCTGAGGAACAAACATTCAGACCACAGCTGTGGTGTTCTGGAATAGTATGTGAGGGCCAAACACTGAGAACCCAACAGCAGTGTTCAGGAATACTAAGTGAGGGACAAACATTCTGACCACAGCAGGAGTGTCCTGGAATCCTATGTGGGGTAGAATAATTCAGACCCTCGTAGCAGTGTTCTGGAATCCTATGAGAGGAACAAACATTCATACCCCAGTAGCAGTGTTCTAGAATCCTATTTGAGGGACAAACACTCAGACAACAGCAGAAATGTTTTGGAATCATATGTGAGGGAGAAACATTCAGACCACAGCAGGACTGTTCTGGAATCCCATGTGAGGCACAAACACCCAGACCACAGCAGGCGTGTTCTGGAATCCTATGTGAGGGTCAAACATTCAGACCACAGCAGTAGTCTTCTGGAATCCTAGATGAGGGACAAATATTCAGACCCCAGCAGTAGTGTTCAGGAATCCTACATGAGGGACAAACATTTAGAACCCAGTAGCATTGTTCTGGAATCCCATGTGAGGGACAATCATTCAGCCCACAGCTGGTGTGTTCTGGAATCCTATTTGTGGGACAAACATTCAGACCCTCGTAGCATTGTTCTGGAATCCTATGTGAGGAAAATACATTCAGAACACAGCAGGATTGTTCTTCAGTCCTATGTGTGAGGCAAACATTCATACCCTTGTAGCCGTGTTCTGGAGTCCTCTGTGGAGGGCTAACATTAAGACCCACGTAACAGTGTTCCGGCATCGTACGTGAGGATAAACACTCAGAACCCAACAGCAGTGTTCTGGAATCCTAAGTGAGGGACAAACATTAAGATGTCAATACCAGTGTTCTGGAATCCTATTTGAGGGACAAACACTCAGACACAGGAGGAATGTTTTGGAAGCCTATGCGAGGGAGAAACATTCAGACTACAGCAGGATTGTTCTGGAATACCATGTGCGGTAAAAACACACAGACCACAGCAGGATTGTTCTGGAATCCTATGTGAGGGTCAACCATTCAGACCACAGCATCAGGGTTCTGGAATCCTATATGAGTGACAAATATTCAGAACACAGCAAGAGAGTTCTGGGATCCTGTGTGTTGGACAAACTCTCAGAACCCAGCAGCAGTGTTCTGGAATCCTATTTAAGGGGGAAACACCCAGACCAGAGCAGGGATGTTCTGGAATCCTATGTGAGG    N       .       PASS    IMPRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=chr17_GL000205v2_random;END=20961;STD_quant_start=10.606602;STD_quant_stop=11.760102;Kurtosis_quant_start=1.632043;Kurtosis_quant_stop=1.397835;SVTYPE=DEL;RNAMES=1922d7e7-1c23-467e-bcf6-186e0a8c0cbc,1a253b33-405f-4348-8da3-12862cb5cc9c,1c4ac883-fe26-4a43-bc7a-eccbc518ce1d,25613d55-fc98-452a-9953-0d0becd23d48,483315fc-b766-4882-9ee0-66c20bead3cd,4c6663eb-9ec5-4c6d-b83d-d38fa8ebf1e6,53a562b8-65b7-4af2-9e98-64537134a126,5c7f3e80-aed4-4cce-9270-b12781cb38a5,87f64880-49a1-49cc-84c8-917d5230eba1,a56576bb-1705-4d12-8138-214f0ebb647e,b7c391fc-c7f6-42fe-bd5f-d85a80e4cb0d,ce039ff5-418a-47e8-a8d5-efbaa0c3057b,ce40e79f-80f4-4282-bf35-36545f425a4a,d5f9abca-505d-44b8-bb58-08078498e0c8,ddcd0ca9-eca8-4d45-ab28-71045aea3a3d,e45480ff-9537-411f-8994-6bb486ab98bc;SUPTYPE=AL,SR;SVLEN=-1292;STRANDS=+-;STRANDS2=9,7,9,7;RE=16;REF_strand=1,4;AF=0.761905    GT:DR:DV        0/1:5:16
fritzsedlazeck commented 5 years ago

These are indicating events on the same cluster (ie. likely to fall onto the same haplotype). Meaning reads are connecting them together. This is indicated by the same id _1, _2,... Cheers Fritz

JYLeeBioinfo commented 5 years ago

Thank you for the reply!

But the point I am having trouble in interpretation is that, as to SV cluster 14647, I have two 14647_1. And for the SV cluster 15771, I have three 15771_3.

Is it okay for me to manually assign unique numbers? For example, as to 14647_1, 14647_1_1 and 14647_1_2.

Or should I delete others except for one of them? For example, retain the first 15771_3 and delete the following two 15771_3

With regards Jinyoung

fritzsedlazeck commented 5 years ago

Oh i See what you mean. I will check it out. Sorry I overlooked that Cheers Fritz

JYLeeBioinfo commented 5 years ago

I examined one of the phased events in which different SVs were assigned the same ID

Hope this help you enhancing the Sniffles.

image

With regards Jinyoung

fritzsedlazeck commented 5 years ago

Thanks. Sorry I havent found time yet to improve it. Its high up my list. Thanks Fritz

JYLeeBioinfo commented 5 years ago

Regarding this https://github.com/fritzsedlazeck/Sniffles/issues/120#issuecomment-449038837 ,

I found a section in whatshap that well describe how phased genotype are represented. https://whatshap.readthedocs.io/en/latest/guide.html#representation-of-phasing-information-in-vcfs

It seems phased genotype can be represented either in GT:PS or GT:HP.

GT:PS 0|1:group1 1|0:group1 0|1:group2 1|0:group2

GT:HP 0/1:group1-1,group1-2 0/1:group1-2,group1-1 0/1:group2-1,group2-2 0/1:group2-2,group2-1

I hope this information help the enhancement!

d-cameron commented 3 years ago

FYI: genotype based SV phasing is problematic due to the constant-ploid assumption underpinning the GT field being violated for SVs as well as the complete inability to phase inter-chromosomal events.

VCFv4.4 (current in draft form) will introduce the PSL (Phase Set List) field which will enable (cis) phasing information to be explicitly reported, even for inter-chromosomal events.

See https://github.com/samtools/hts-specs/pull/421 for the current design. Feedback from potential implementors is welcome (and encouraged).