DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
477 stars 119 forks source link

hisat2_extract_snps_haplotypes_VCF.py: AssertionError #103

Open ChristianCortes opened 7 years ago

ChristianCortes commented 7 years ago

Hi,

I am trying to extract SNPs using hisat2_extract_snps_haplotypes_VCF.py script from Ensembl VCF file (ftp://ftp.ensembl.org/pub/release-87/variation/vcf/danio_rerio/Danio_rerio.vcf.gz) to build a zebrafish index for GRCz10_GCA_000002035.3 assembly (ftp://ftp.ensembl.org/pub/release-87/fasta/danio_rerio/dna/Danio_rerio.GRCz10.dna.toplevel.fa.gz).

I run: hisat2_extract_snps_haplotypes_VCF.py -v Danio_rerio.GRCz10.dna.toplevel.fa Danio_rerio.vcf Danio_rerio.GRCz10.87

Issue: The script run some time and STOP. I got the following error. Traceback (most recent call last): File "/Users/XRIS/bin/hisat2_extract_snps_haplotypes_VCF.py", line 892, in args.verbose) File "/Users/XRIS/bin/hisat2_extract_snps_haplotypes_VCF.py", line 730, in main genotypes) File "/Users/XRIS/bin/hisat2_extract_snps_haplotypes_VCF.py", line 688, in add_vars tmp_vars = extract_vars(chr_dic, chr, pos, ref_allele, alt_alleles, varID) File "/Users/XRIS/bin/hisat2_extract_snps_haplotypes_VCF.py", line 103, in extract_vars assert min_len >= 1 AssertionError

Note: The output file has only chr1 information

Also, I've tested dbSNP files from NCBI (ftp://ftp.ncbi.nih.gov/snp/organisms/zebrafish_7955/VCF/) and Zv9 (GCA_000002035.2, ftp://ftp.ensembl.org/pub/release-79/fasta/danio_rerio/dna/) I got the same problem. For example Danio_rerio.Zv9.chromosome.1.fa.gz and vcf_chr_1-vcf.gz, I got a similar error.

Traceback (most recent call last): File "/Users/XRIS/bin/hisat2_extract_snps_haplotypes_VCF.py", line 892, in args.verbose) File "/Users/XRIS/bin/hisat2_extract_snps_haplotypes_VCF.py", line 730, in main genotypes) File "/Users/XRIS/bin/hisat2_extract_snps_haplotypes_VCF.py", line 688, in add_vars tmp_vars = extract_vars(chr_dic, chr, pos, ref_allele, alt_alleles, varID) File "/Users/XRIS/bin/hisat2_extract_snps_haplotypes_VCF.py", line 113, in extract_vars assert ref_allele2 != alt_allele AssertionError

This error only happens with some chr vcf files.

Any help will be welcome,

Thanks in advance,

Christian

tantrev commented 6 years ago

I've also ran into this issue when trying to extract SNPs from the latest (v92) VCF file for mouse variation from ENSEMBL.

The file may be downloaded from: ftp://ftp.ensembl.org/pub/release-92/variation/vcf/mus_musculus/mus_musculus.vcf.gz

a00101 commented 4 years ago

I've also ran into too. Is this error fixed ?