Open zhaouu opened 8 years ago
Would you like to show some example lines of the VCF file from GATK including deletions so that I can modify the HISAT2 scripts?
A sample VCF file can be downloaded in https://drive.google.com/open?id=0B8ijxd00y5RZUDJaVVF4aUtNZmM. The link ( http://gatkforums.broadinstitute.org/gatk/discussion/5904/asterisk-in-the-alt-feild-of-vcf) can find the explanation of asterisk "*" in GATK VCF file. Because the VCF file don't have ID column, I modified the hisat2_extract_snps_haplotypes_VCF.py script:
chr, pos, varID, ref_allele, alt_alleles, qual, filter, info = line.strip().split('\t')
varID = '%s_%s' %(chr,pos)
Commend line to run the script:
`~/software/hisat2-2.0.4/hisat2_extract_snps_haplotypes_VCF.py --non-rs --verbose ~/data/Brassicanapus/Brassica_napus_v4.1.chromosomes.fa /public/home/hhao/data/rape_seq/RNAseq/test/test.vcf.gz ~/data/rape_seq/hisat2_snp
Thanks for your help.
I use hisat2_extract_snps_haplotypes_VCF.py to extract SNPs and haplotypes from a VCF. When I used the VCF file which extracted from samtools, it can ran successfully. But when I used the VCF file which extracted from GATK, I get an error massge:
Traceback (most recent call last): File "/public/home/hzhao/software/hisat2-2.0.4/hisat2_extract_snps_haplotypes_VCF.py", line 894, in
args.verbose)
File "/public/home/hzhao/software/hisat2-2.0.4/hisat2_extract_snps_haplotypes_VCF.py", line 732, in main
genotypes)
File "/public/home/hzhao/software/hisat2-2.0.4/hisat2_extract_snps_haplotypes_VCF.py", line 690, in add_vars
tmp_vars = extract_vars(chr_dic, chr, pos, ref_allele, alt_alleles, varID)
File "/public/home/hzhao/software/hisat2-2.0.4/hisat2_extract_snps_haplotypes_VCF.py", line 113, in extract_vars assert ref_allele2 != alt_allele
AssertionError
And, I GATK latest version, they use an asterisk represents a deleted allele. I think it's not compatible with this script. How can I generate the snp and haplotype file for hisat2? Hope some suggests for me. Thanks very much!