DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
473 stars 116 forks source link

hisat2_extract_snps_haplotypes_VCF.py giving empty output #218

Open chandrans opened 4 years ago

chandrans commented 4 years ago

Hi. I am trying to run this script on a VCF, but I get a .snp and .haplotype output file every time that is empty. My command is:

'./hisat2_extract_snps_haplotypes_VCF.py -v chr10.fa ALL.chr10_GRCh38.genotypes.20170504.vcf Test'

The test.snp and test.haplotype are created but have nothing in them. Does anyone know why this may be? I have tried with another fasta and VCF as well and have the same issue, Python version 2.7.5 is default.

The tool seems to run to completion, however if I "control-C" it, I find this traceback:

File "./hisat2-0f01dc6397a/hisat2_extract_snps_haplotypes_VCF.py", line 892, in args.verbose) File "./hisat2-0f01dc6397a/hisat2_extract_snps_haplotypes_VCF.py", line 591, in main for line in vcf_proc.stdout: KeyboardInterrupt

chandrans commented 4 years ago

Just realized the Ctrl-C bit does not help

parkchanhee commented 4 years ago

Can you make sure that the chromosome name is same in the genome and vcf file?

chandrans commented 4 years ago

@parkchanhee Thanks. Bad assumption on my part! This fixed it.

chandrans commented 4 years ago

Actually, this is again giving an error on "correct" files.

./hisat2_extract_snps_haplotypes_VCF.py -v resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf Test

is giving this error:

Traceback (most recent call last): File "./hisat2-0f01dc6397a/hisat2_extract_snps_haplotypes_VCF.py", line 892, in args.verbose) File "./hisat2-0f01dc6397a/hisat2_extract_snps_haplotypes_VCF.py", line 730, in main genotypes) File "./hisat2-0f01dc6397a/hisat2_extract_snps_haplotypes_VCF.py", line 688, in add_vars tmp_vars = extract_vars(chr_dic, chr, pos, ref_allele, alt_alleles, varID) File "./hisat2-0f01dc6397a/hisat2_extract_snps_haplotypes_VCF.py", line 113, in extract_vars assert ref_allele2 != alt_allele AssertionError

I understand at some place the reference may be the same as the alt allele, but these files are from the Broad Institute cloud. They are the dbsnp and fasta file for hg38. Do you have some files that work for sure with the script?

parkchanhee commented 4 years ago

We have modified the script to handle that case. You can download the script from hisat2_2.2.0_beta branch.