AndersenLab / VCF-kit

VCF-kit: Assorted utilities for the variant call format
http://www.andersenlab.org
MIT License
122 stars 25 forks source link

error loading tabix index for building a region specific tree #31

Open ccastane9 opened 3 years ago

ccastane9 commented 3 years ago

Hi, I am trying to build a tree based on a specific region on a chromosome - however I am receiving the error below saying that there was an issue loading/reading the tabix file. I have my .vcf file and it's corresponding tabix file (.vcf.gz.tbi) in the same working folder, and I was able to build a tree based off of my .vcf file.

(vcf-kit) [ccastane9@andersserver-01 FKBP6_home]$ vk phylo tree nj ECA13_260.vcf 13:11230000-11700000 > ECA13_tree_11230000_11700000_260.newick [E::idx_find_and_load] Could not retrieve index file for 'ECA13_260.vcf' Traceback (most recent call last): File "/home/ccastane9/miniconda3/envs/vcf-kit/lib/python3.7/site-packages/vcfkit/phylo.py", line 104, in main() File "/home/ccastane9/miniconda3/envs/vcf-kit/lib/python3.7/site-packages/vcfkit/phylo.py", line 57, in main for line in variant_set: File "cyvcf2/cyvcf2.pyx", line 442, in call AssertionError: error loading tabix index for b'ECA13_260.vcf'

danielecook commented 3 years ago

@ccastane9 you have to bgzip the VCF file and index it with bcftools for this to work.

bcftools view -O z your_vcf.vcf > out.vcf.gz
bcftools index out.vcf.gz

Then the command should work.

ccastane9 commented 3 years ago

I believe this has worked, although now I am getting the error that there are no genotypes in my desired region (roughly 3.6Mb). I know this can't be true as there are genotypes at this region (roughly 104,000 variants) in the vcf file. The message is below:

vk phylo tree nj ECA13_260_vcfkit.vcf.gz I:8091163-11699996 > newtree.newick no intervals found for b'ECA13_260_vcfkit.gz' at I8091163-11699996 no genotypes

I tried to run the file without using specific coordinates and that was unsuccessful as well. It didn't give the error that there were no genotypes like above, the tree output was just empty. I believe I ran the commands as mentioned above: bcftools view -O z [input.vcf] > [output.vcf.gz] bcftools index [output,vcf.gz] --> this generated a file.vcf.gz.csi

Thanks for all of the help!