Closed linlin-coder closed 1 year ago
Similar to #7, this assertion is checking that the variant REF allele (left
here) matches the reference sequence from the FASTA file (right
here). The issue with #7 was that the VCF file had "N" for the reference allele, but here you seem to have actual bases (e.g. [71]
= "G"). This makes me think that you may have an inconsistent reference file. I recommend checking the FASTA files used for generating your VCF files and making sure it is exactly the same reference you are giving to HiPhase.
I'll make a note to clarify the error though, seems like this has popped up a couple times and the source of the assertion error is not clear for users.
From my actual analysis, it can be seen that the reference files used to generate VCF files and hiphase are consistent, both of which remove contig and retain the reference sequence files. I am very troubled by this result and look forward to your answer.
If you can point me to public data or share privately, I'm happy to take a look myself.
If not, then perhaps you can share the header information at least:
bcftools view -h {vcf}
will get the VCF header, I would need to see this for all input VCF files you havegrep ">" {reference}
will get the reference contigs from your FASTA fileIf those match up, then I can probably put together a debug binary to help point out where the discrepancy is occurring in greater detail.
Here is also a pre-view of the change in attached binary (this isn't an official release, it just has the change I mentioned earlier). If you re-run with this binary, you should get an output that indicates the discrepancy. Here's an example where I aligned and variant called with hg38 and then passed CHM13 reference to HiPhase:
...
[2023-05-31T17:47:59.269Z INFO hiphase::data_types::reference_genome] Loading "reference/human_chm13v2.0_maskedY_rCRS.fasta"...
[2023-05-31T17:48:41.687Z INFO hiphase::data_types::reference_genome] Finished loading 25 contigs.
[2023-05-31T17:48:41.919Z ERROR hiphase] Error while processing PhaseBlock { block_index: 0, coordinates: "chr1:10107-31294", num_variants: 63, sample_name: "HG001" }:
[2023-05-31T17:48:41.919Z ERROR hiphase] Reference mismatch error: variant at chr1:10108 has REF allele = "C", but reference genome has "T".
Thank you for your enthusiastic help. I think it may be due to a mismatch between the reference sequence and the VCF file. I will also try the new program later. Thank you for your help. In addition, is it possible for hiphase to consider phasing the trio family? Whatshap can support SNV and InDel phasing, but no corresponding analysis plan has been provided under structural variation and complex structural variation conditions.
Based on the snippet I got via email (looks like maybe deleted here?), there is a reference mismatch happening. The command you posted above used hg38 (hg38/Homo_sapiens_assembly38.onlychr.fasta
) whereas the snippet from the VCF was using hg19 (ucsc.hg19.fasta.onlychr.fasta
). These are definitely incompatible from a reference perspective. HiPhase needs all alignment files, variant calls file, and reference fasta to use the same reference file or you'll likely hit this error again.
As for the trio question, the general idea of pedigree-backed phasing is in our backlog of possible extensions. As of right now, HiPhase is focused on pure read-backed phasing for an individual dataset, so even if you phase a trio with one command (e.g. https://github.com/PacificBiosciences/HiPhase/blob/main/docs/user_guide.md#multi-sample-vcfs) it will not actually leverage any pedigree information while phasing. We may add this in the future, but I don't have a timeline for you.
Based on the comments and figure via email, I think this problem is resolved and we will have better error messaging around reference mismatch in the next release. Closing it for now, but feel free to re-open if the issue persists.
Thank you for your help. I will continue to follow with interest the development of this project
Dear developer, currently I have encountered similar issues in the previous issue #7 while using HiPhase. The error details are as follows: environment:
Error like, i hope the developer can check the error and help and guide me on how to handle the data in the future: