DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
473 stars 116 forks source link

logic error in hisat2_extract_snps_haplotypes_VCF.py #212

Open verne91 opened 5 years ago

verne91 commented 5 years ago

I am using hisat2_extract_snps_haplotypes_VCF.py to extract snps from my vcf files. I found a logic error in the script between line 92 and line 114:

def extract_vars(chr_dic, chr, pos, ref_allele, alt_alleles, varID):
    chr_seq = chr_dic[chr]
    vars = []
    assert ',' not in ref_allele
    alt_alleles = alt_alleles.split(',')    
    for a in range(len(alt_alleles)):
        alt_allele = alt_alleles[a]
        if 'N' in alt_allele:
            continue
        ref_allele2, pos2 = ref_allele, pos
        min_len = min(len(ref_allele2), len(alt_allele))
        assert min_len >= 1
        if min_len > 1:
            ref_allele2 = ref_allele2[min_len - 1:]
            alt_allele = alt_allele[min_len - 1:]
            pos2 += (min_len - 1)

        type, data = '', ''
        if len(ref_allele2) == 1 and len(alt_allele) == 1:
            type = 'S'
            data = alt_allele
            if ref_allele2 == alt_allele:
            assert ref_allele2 != alt_allele

The first five columns of the record in my example are:

chr 1 10150 chr1_10150 CT TT,*,C

The reference allele is CT, and the alternative alleles are TT,*,C.

When the script process the first alternative allele, the variables in the script are:

alt_allele = "TT"
ref_allele2 = "CT"
min_len = 2

Therefore, on line 105 and 106, the variables become:

alt_allele = "T"
ref_allele2 = "T"

So, it would cause the assertion error on line 118.

I don't know why you just extract the latter part of the two alleles. In addition, I don't think this script can deal with "*" in the alternative alleles.

verne91 commented 5 years ago

Any update for this issue?

parkchanhee commented 5 years ago

This problem has been fixed at hisat2_v2.2.0_beta branch. We are testing 2.2.0_beta to release. You can use the script file of the beta branch.

But, '*' alternate bases are not supported by the scrip. We will add the feature soon.

Thank you.