Illumina / hap.py

Haplotype VCF comparison tools
Other
402 stars 122 forks source link

Error: "Fasta file /opt/hap.py-data/hg19.fa is not indexed" #138

Closed valentyn-dev closed 3 years ago

valentyn-dev commented 3 years ago

Hello!

I am using the pre-built Docker image from pkrusche. I mount the cloned git repo to /data and execute make_hg19.sh to make the reference file hg19.sh, which I place in /opt/hap.py-data. Just in case, I also define (within the Docker) the environmental variables HGREF and HF19 pointing to this file.

However, when I try to test hap.py functionality (using the provided example vcf/bed files):

# /opt/hap.py/bin/hap.py /data/example/happy/PG_NA12878_chr21.vcf.gz /data/example/happy/NA12878_chr21.vcf.gz -f /data/example/happy/PG_Conf_chr21.bed.gz -o /data/test/test

I get the following error:

[W] overlapping records at chr21:10993857 for sample 0
[W] Symbolic / SV ALT alleles at chr21:15847469
[W] Variants that overlap on the reference allele: 144
[W] Variants that have symbolic ALT alleles: 14
[I] Total VCF records:         65402
[I] Non-reference VCF records: 65402
2021-04-17 10:40:20,242 ERROR    Fasta file /opt/hap.py-data/hg19.fa is not indexed
2021-04-17 10:40:20,242 ERROR    Traceback (most recent call last):
2021-04-17 10:40:20,243 ERROR      File "/opt/hap.py/bin/hap.py", line 508, in <module>
2021-04-17 10:40:20,243 ERROR        main()
2021-04-17 10:40:20,243 ERROR      File "/opt/hap.py/bin/hap.py", line 296, in main
2021-04-17 10:40:20,244 ERROR        "TRUTH")
2021-04-17 10:40:20,244 ERROR      File "/opt/hap.py/bin/pre.py", line 132, in preprocess
2021-04-17 10:40:20,244 ERROR        reference_contigs = set(fastaContigLengths(reference).keys())
2021-04-17 10:40:20,244 ERROR      File "/opt/hap.py/lib/python27/Tools/fastasize.py", line 39, in fastaContigLengths
2021-04-17 10:40:20,245 ERROR        raise Exception("Fasta file %s is not indexed" % fastafile)
2021-04-17 10:40:20,245 ERROR    Exception: Fasta file /opt/hap.py-data/hg19.fa is not indexed

What does it mean that the hg19.fa is not indexed? And can I fix that somehow?

EDIT: I set up an Ubuntu 18.04 virtual machine and was able to get hap.py installed using the provided install script. Running this command leads to the exact same error message, suggesting that the problem really is with the hg19.fa file...

Any help would be greatly appreciated!

nate-d-olson commented 3 years ago

You will want to index your reference genome first using samtools faidx /opt/hap.py-data/hg19.fa, https://www.htslib.org/doc/samtools-faidx.html.

valentyn-dev commented 3 years ago

That did the trick! Thank you so much!