bmvdgeijn / WASP

WASP: allele-specific pipeline for unbiased read mapping and molecular QTL discovery
Apache License 2.0
102 stars 51 forks source link

snp2h5 reads vcfs as empty #91

Closed snaqvi1990 closed 4 years ago

snaqvi1990 commented 4 years ago

I am trying to generate haplotype and snp h5 files from a list of by-chromosome vcfs, but I get the following warning for each chromosome when running snp2h5:

snp2h5 --chrom hg38.chromInfo.txt --format vcf --haplotype haplotypes.h5 --snp_index snp_index.h5 --snp_tab snp_tab.h5 /oak/stanford/groups/pritch/users/naqvi/prescott15_enh/bam/hg19/WASP_remapping/H9_10x/bychr/chr*.vcf

WARNING: VCF file /oak/stanford/groups/pritch/users/naqvi/prescott15_enh/bam/hg19/WASP_remapping/H9_10x/bychr/chr10.vcf contained no data lines WARNING: input file /oak/stanford/groups/pritch/users/naqvi/prescott15_enh/bam/hg19/WASP_remapping/H9_10x/bychr/chr10.vcf is empty

Not sure why this is happening -- other WASP scripts like extract_vcf_snps.sh can read these vcfs.

Thanks, Sahin

gmcvicker commented 4 years ago

Hi Sahin,

I am not sure what the issue is. Would it be possible for you to share a small version of the chr10.vcf file (e.g. first 1000 lines) so I can try to debug the problem? You can email it to me at gmcvicker@salk.edu.

Thanks,

Graham

gmcvicker commented 4 years ago

I think the problem might be that the VCF files are missing their header lines. snp2h5 expects the last header line (starting with #CHROM) to contain the names of the samples. I have pushed changes to the code to give a more informative warning message when the header is missing.

snaqvi1990 commented 4 years ago

Ah ok, thanks. So if I just paste the header from the original .vcf file from which these by-chromosome files were generated, that should work, right?

snaqvi1990 commented 4 years ago

OK I tried this, and it seems like it works now. Thanks for your help!

gmcvicker commented 4 years ago

Great!