PlantandFoodResearch / MCHap

Polyploid micro-haplotype assembly using Markov chain Monte Carlo simulation.
MIT License
18 stars 3 forks source link

Improve error messages for uncompressed and non-indexed input files. #163

Closed sjclare closed 5 months ago

sjclare commented 1 year ago

Hey Tim,

I loved seeing what your package can do at the Tools for Polyploids. I think I have installed since when I run mchap assemble -h I get an output. But when I try to run on real data using following code, I get a error. I'm not actually sure what I have done wrong, maybe it hates conda being there?

code: mchap assemble \ --bam NHB.bam.list \ --targets FlexSeq.bed \ --variants NHB.poly.vcf \ --reference W85_Phase0.fasta \ --ploidy 4 \ --inbreeding 0.01 \ --cores 12 > NHB.hap.vcf

error: Traceback (most recent call last): File "/home/sjclare/miniconda3/bin/mchap", line 8, in <module> sys.exit(main()) File "/home/sjclare/miniconda3/lib/python3.10/site-packages/mchap/application/cli.py", line 27, in main prog.cli(sys.argv).run_stdout() File "/home/sjclare/miniconda3/lib/python3.10/site-packages/mchap/application/baseclass.py", line 428, in run_stdout self._run_stdout_multi_core() File "/home/sjclare/miniconda3/lib/python3.10/site-packages/mchap/application/baseclass.py", line 413, in _run_stdout_multi_core for locus in self.loci(): File "/home/sjclare/miniconda3/lib/python3.10/site-packages/mchap/application/assemble.py", line 79, in loci yield b.set_sequence(self.ref).set_variants(self.vcf) File "/home/sjclare/miniconda3/lib/python3.10/site-packages/mchap/io/loci.py", line 74, in set_variants return _set_locus_variants(self, f) File "/home/sjclare/miniconda3/lib/python3.10/site-packages/mchap/io/loci.py", line 365, in _set_locus_variants for var in variant_file.fetch(locus.contig, locus.start, locus.stop): File "pysam/libcbcf.pyx", line 4464, in pysam.libcbcf.VariantFile.fetch ValueError: fetch requires an index

timothymillar commented 1 year ago

Hi @sjclare. This looks like either the input VCF file or the refernence genome have not been indexed. The NHB.poly.vcf file should be compressed with bgzip and then indexed using tabix (see this example). The reference genome should be indexed using faidx. These are all programs that come with samtools/htslib.

Let me know if that works and thanks for posting the issue, I need to make this error message friendlier!

sjclare commented 1 year ago

Well looks like I forgot to zip and index my vcf, so apologies for not following your guide! I have an output!

timothymillar commented 1 year ago

No worries, it's a good reminder that I need to improve the error messages! I'll rename this issue and leave it open until they're improved.