PacificBiosciences / HiPhase

Small variant, structural variant, and short tandem repeat phasing tool for PacBio HiFi reads
Other
70 stars 4 forks source link

thread 'main' panicked error #39

Closed GuidoGallo closed 5 months ago

GuidoGallo commented 5 months ago

Dear developers, I'm encountering an error using HiPhase to phase a multi-sample vcf (non-human genome) that I cannot really understand. I've used this program in the past and now I was trying to re-run it on a subset of the same population, using the same type of command, and for some reason it is erroring out. I'm using the latest version hiphase 1.4.2-c7e0700. I'm copying a part of the command used:

hiphase -b path/to/bam1 -b path/to/bam2 ..... --vcf input.vcf.gz --output-vcf output.phased.vcf.gz --reference path/to/reference.fasta -s sample_name1 -s sample_name2 .... --threads 32 --min-vcf-qual 10 --blocks-file file.tsv --summary-file file.summary.tsv

The program exits immediately with this error:

thread 'main' panicked at src/block_gen.rs:636:44: calledResult::unwrap()on anErrvalue: Fetch note: run withRUST_BACKTRACE=1environment variable to display a backtrace

To my understanding, the tool encountered an unexpected error while trying to access some resource, but I can't really tell what causes the error. As I said, I downloaded the last hiphase version, I checked all paths to files are accessible and that I have all rights and permissions. Do you have some ideas as to what can cause this kind of errors? Thanks a lot for your time, I appreciate any help

holtjma commented 5 months ago

Hi @GuidoGallo,

So assuming you're on the latest version, it's getting an error at this point in the code: https://github.com/PacificBiosciences/HiPhase/blob/9b01d7eda42a56dc9d20d39aa8b6fe23b60f5ab1/src/block_gen.rs#L636

That line is trying to fetch a region in the BAM file. I've never actually encountered an error there before, so I'm going to do some guessing as to what might be causing it, most of which involve a deviation from "normal" processing that is leaving BAMs and/or VCFs in an inconsistent state:

  1. It looks like you're using multiple BAM files. Is it possible that one or more of the BAM files is missing one or more chromosomes from the dataset? This may be a missing entry in the header. It may also be a chromosome with no alignments to it, I'm not sure what the fetch would do in that instance (if this is the case, we may need to patch it).
  2. Were the VCFs processed in a way that their chromosome lists do not match the BAM headers? If so, it may be trying to fetch a region from the VCF that does match exist in the BAM.
  3. Are you able to tell roughly where this is happening in the processing? In other words, is it happening on a "core" chromosome, or is it throwing errors somewhere in an ALT contig or similar? That may help provide some clues if neither of the above is true.

Hope this helps, Matt

GuidoGallo commented 5 months ago

Dear Matt Thank you so much for the rapid answer. You are right, one of the the bam files I was using had an inconsistency with the chromosomes as defined in the vcf header, I didn't realize it. Thanks again and congratulations for the great tool, it worked perfectly :)

holtjma commented 5 months ago

Great, glad it worked out! I'll make a note to improve the messaging there. Hopefully future users won't encounter a cryptic panic message and instead get something a little more insightful.