Open nathanvranken opened 2 years ago
Hi @nathanvranken ,
I'm able to copy and paste that into a text file, bgzip, index then load using allel.read_vcf
without any problem.
Things to try:
I ran into the same issue and was able to resolve it by re-zipping with bcftools view $vcf -output-type z --output-file $new
. I did notice that if I did not create a new .tbi and .cbi index it gave me the same error. It'd be interesting to know when the index files are used.
Hi @nathanvranken, @mckeowr1, this may possibly be caused by having an index file (.tbi) which is older than the VCF file. In that case, tabix issues a warning to stderr, and then proceeds. However, currently scikit-allel sends stderr to stdout, so this warning gets parsed as if it was within the VCF file, and interrupts the usual parsing of headers. The offending line is here.
A workaround is to touch the index file so it's newer than the VCF.
When trying to read some data from a vcf file of simulated data using msprime, I keep running into the same error.
Here is the head of the vcf file (20220112_msprime_length10000000.0_split100000.0_gene_flow100.0_chr2.vcf.gz):
When running the following code:
allel.read_vcf(vcf, fields=['calldata/GT'], samples=None, alt_number=1, region='2:0-10000', tabix='tabix')
I consistently get the following error:
RuntimeError: VCF file is missing mandatory header line ("#CHROM...")
Does anyone have an idea what might be causing this error?