eldariont / svim

Structural Variant Identification Method using Long Reads
GNU General Public License v3.0
155 stars 19 forks source link

merge the two lines translocation(BND) records into one line #47

Closed charliechen912ilovbash closed 3 years ago

charliechen912ilovbash commented 3 years ago

In the output, I found out that translocation (BND) are separated into two lines, the position1 and position2 is reversed, the svim.BND.xxx sv ID is also different. I want to merge every translocation event into one line. Is there any recommended method? I am using multi SV callers, and found out that some callers do the same thing for translocations like manta and svaba some callers do not (only one line for each translocation) like sniffles and cuteSV.

chr1    53900   svim.BND.7      N       N[chr2:11656999[        1       PASS    SVTYPE=BND;SUPPORT=1;STD_POS1=.;STD_POS2=.
      GT:DP:AD        ./.:.:.,.
chr2    11656999        svim.BND.156311 N       ]chr1:53900]N   1       PASS    SVTYPE=BND;SUPPORT=1;STD_POS1=.;STD_POS2=.      GT:DP:AD        ./.:.:.,.
eldariont commented 3 years ago

Hi,

yes, SVIM separates translocations into two lines in order to follow the specification of the Variant Calling Format (see Section 5.4 on page 17).

If you need only one line per translocation you could use a simple Python script similar to the one below.

Cheers David

import sys
from cyvcf2 import VCF, Writer

vcf_file = VCF(sys.argv[1])
out_file = Writer(sys.argv[2], vcf_file)
for variant in vcf_file:
    if variant.INFO["SVTYPE"] == "BND":
        from_chrom = variant.CHROM
        from_pos = variant.POS
        alt_string = variant.ALT[0]
        #fwd direction at pos1
        if alt_string[0] == "N":
            pos_fields = alt_string[2:-1].split(":")
            assert len(pos_fields) == 2
            to_chrom = pos_fields[0]
            to_pos = int(pos_fields[1])
        #rev direction at pos1
        else:
            pos_fields = alt_string[1:-2].split(":")
            assert len(pos_fields) == 2
            to_chrom = pos_fields[0]
            to_pos = int(pos_fields[1])
        if from_chrom < to_chrom:
            out_file.write_record(variant)
        elif from_chrom == to_chrom:
            if int(from_pos) < int(to_pos):
                out_file.write_record(variant)
vcf_file.close()
out_file.close()
charliechen912ilovbash commented 3 years ago

Thanks you very much! I will try it.