PacificBiosciences / HiPhase

Small variant, structural variant, and short tandem repeat phasing tool for PacBio HiFi reads
Other
70 stars 4 forks source link

Error “thread '<unnamed>' panicked at 'assertion failed: `(left == right)`” occurred while HIPhase working #12

Closed linlin-coder closed 1 year ago

linlin-coder commented 1 year ago

Dear developer, currently I have encountered similar issues in the previous issue #7 while using HiPhase. The error details are as follows: environment:

# hiphase version:0.10.0
# calling snv/indel:deepvariant  version:1.4.0
# calling sv:pbsv version:pbsv 2.8.1 (commit SL-release-10.2.1-31-ge3fa446)
hiphase \
--reference hg38/Homo_sapiens_assembly38.onlychr.fasta \
--global-realignment-cputime 300 --vcf sample1.MergeVcfs.vcf.gz --output-vcf sample1.deepvariant.phased.vcf.gz \
--vcf sample1.PASS.PRECISE.vcf.gz --output-vcf sample1.pbsv.phased.vcf.gz \
--bam sample1.sort.bam --threads 10 --summary-file sample1.summary.tsv --blocks-file sample1.blocks.tsv \
--stats-file sample1.stats.csv --haplotag-file sample1.haplotag.csv

Error like, i hope the developer can check the error and help and guide me on how to handle the data in the future:

[2023-05-29T07:46:01.381Z INFO  hiphase::data_types::reference_genome] Loading "hg38/Homo_sapiens_assembly38.onlychr.fasta"...
[2023-05-29T07:46:09.707Z INFO  hiphase::data_types::reference_genome] Finished loading 25 contigs.
[2023-05-29T07:46:09.707Z INFO  hiphase] Starting job pool with 10 threads...
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `[71]`,
 right: `[67]`', src/phaser.rs:221:21
stack backtrace:
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `[71]`,
 right: `[67]`', src/phaser.rs:221:21
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `[84]`,
 right: `[65]`', src/phaser.rs:221:21
   0:           0x59e5f4 - std::backtrace_rs::backtrace::libunwind::trace::ha9053a9a07ca49cb
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:           0x59e5f4 - std::backtrace_rs::backtrace::trace_unsynchronized::h9c2852a457ad564e
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:           0x59e5f4 - std::sys_common::backtrace::_print_fmt::h457936fbfaa0070f
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys_common/backtrace.rs:65:5
   3:           0x59e5f4 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h5779d7bf7f70cb0c
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys_common/backtrace.rs:44:22
   4:           0x4957fe - core::fmt::write::h5a4baaff1bcd3eb5
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/fmt/mod.rs:1232:17
   5:           0x57a992 - std::io::Write::write_fmt::h4bc1f301cb9e9cce
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/io/mod.rs:1684:15
   6:           0x59fc39 - std::sys_common::backtrace::_print::h5fcdc36060f177e8
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys_common/backtrace.rs:47:5
   7:           0x59fc39 - std::sys_common::backtrace::print::h54ca9458b876c8bf
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys_common/backtrace.rs:34:9
   8:           0x59f868 - std::panicking::default_hook::{{closure}}::hbe471161c7664ed6
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:271:22
   9:           0x5a0881 - std::panicking::default_hook::ha3500da57aa4ac4f
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:290:9
  10:           0x5a0881 - std::panicking::rust_panic_with_hook::h50c09d000dc561d2
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:692:13
  11:           0x5a036e - std::panicking::begin_panic_handler::{{closure}}::h9e2b2176e00e0d9c
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:583:13
  12:           0x5a02d6 - std::sys_common::backtrace::__rust_end_short_backtrace::h5739b8e512c09d02
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys_common/backtrace.rs:150:18
  13:           0x5a02cd - rust_begin_unwind
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:579:5
  14:           0x40258c - core::panicking::panic_fmt::hf33a1475b4dc5c3e
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/panicking.rs:64:14
  15:           0x4027a6 - core::panicking::assert_failed_inner::haf9816227b20b6f2
  16:           0x4067f1 - core::panicking::assert_failed::h816680c3b2244efb
  17:           0x4e636e - hiphase::phaser::solve_block::h99faa68405ef9ccc
  18:           0x41331c - <F as threadpool::FnBox>::call_box::h07a333169af0e835
  19:           0x5a5d70 - std::sys_common::backtrace::__rust_begin_short_backtrace::h2244e6eb820fac9f
  20:           0x5a49e4 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h2f8049bc016327da
  21:           0x5a1395 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h39990b24eedef2ab
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/alloc/src/boxed.rs:1987:9
  22:           0x5a1395 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h01a027258444143b
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/alloc/src/boxed.rs:1987:9
  23:           0x5a1395 - std::sys::unix::thread::Thread::new::thread_start::ha4f1cdd9c25884ba
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys/unix/thread.rs:108:17
  24:           0x6658f7 - start_thread
                               at ./nptl/./nptl/pthread_create.c:477:8
  25:           0x6e979f - __clone
  26:                0x0 - <unknown>
stack backtrace:
   0:           0x59e5f4 - std::backtrace_rs::backtrace::libunwind::trace::ha9053a9a07ca49cb
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:           0x59e5f4 - std::backtrace_rs::backtrace::trace_unsynchronized::h9c2852a457ad564e
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:           0x59e5f4 - std::sys_common::backtrace::_print_fmt::h457936fbfaa0070f
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys_common/backtrace.rs:65:5
   3:           0x59e5f4 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h5779d7bf7f70cb0c
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys_common/backtrace.rs:44:22
   4:           0x4957fe - core::fmt::write::h5a4baaff1bcd3eb5
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/fmt/mod.rs:1232:17
   5:           0x57a992 - std::io::Write::write_fmt::h4bc1f301cb9e9cce
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/io/mod.rs:1684:15
   6:           0x59fc39 - std::sys_common::backtrace::_print::h5fcdc36060f177e8
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys_common/backtrace.rs:47:5
   7:           0x59fc39 - std::sys_common::backtrace::print::h54ca9458b876c8bf
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys_common/backtrace.rs:34:9
   8:           0x59f868 - std::panicking::default_hook::{{closure}}::hbe471161c7664ed6
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:271:22
   9:           0x5a0881 - std::panicking::default_hook::ha3500da57aa4ac4f
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:290:9
  10:           0x5a0881 - std::panicking::rust_panic_with_hook::h50c09d000dc561d2
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:692:13
  11:           0x5a036e - std::panicking::begin_panic_handler::{{closure}}::h9e2b2176e00e0d9c
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:583:13
  12:           0x5a02d6 - std::sys_common::backtrace::__rust_end_short_backtrace::h5739b8e512c09d02
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys_common/backtrace.rs:150:18
  13:           0x5a02cd - rust_begin_unwind
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:579:5
  14:           0x40258c - core::panicking::panic_fmt::hf33a1475b4dc5c3e
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/panicking.rs:64:14
  15:           0x4027a6 - core::panicking::assert_failed_inner::haf9816227b20b6f2
  16:           0x4067f1 - core::panicking::assert_failed::h816680c3b2244efb
  17:           0x4e636e - hiphase::phaser::solve_block::h99faa68405ef9ccc
  18:           0x41331c - <F as threadpool::FnBox>::call_box::h07a333169af0e835
  19:           0x5a5d70 - std::sys_common::backtrace::__rust_begin_short_backtrace::h2244e6eb820fac9f
  20:           0x5a49e4 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h2f8049bc016327da
  21:           0x5a1395 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h39990b24eedef2ab
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/alloc/src/boxed.rs:1987:9
  22:           0x5a1395 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h01a027258444143b
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/alloc/src/boxed.rs:1987:9
  23:           0x5a1395 - std::sys::unix::thread::Thread::new::thread_start::ha4f1cdd9c25884ba
                               at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys/unix/thread.rs:108:17
  24:           0x6658f7 - start_thread
                               at ./nptl/./nptl/pthread_create.c:477:8
  25:           0x6e979f - __clone
  26:                0x0 - <unknown>
holtjma commented 1 year ago

Similar to #7, this assertion is checking that the variant REF allele (left here) matches the reference sequence from the FASTA file (right here). The issue with #7 was that the VCF file had "N" for the reference allele, but here you seem to have actual bases (e.g. [71] = "G"). This makes me think that you may have an inconsistent reference file. I recommend checking the FASTA files used for generating your VCF files and making sure it is exactly the same reference you are giving to HiPhase.

I'll make a note to clarify the error though, seems like this has popped up a couple times and the source of the assertion error is not clear for users.

linlin-coder commented 1 year ago

From my actual analysis, it can be seen that the reference files used to generate VCF files and hiphase are consistent, both of which remove contig and retain the reference sequence files. I am very troubled by this result and look forward to your answer.

holtjma commented 1 year ago

If you can point me to public data or share privately, I'm happy to take a look myself.

If not, then perhaps you can share the header information at least:

If those match up, then I can probably put together a debug binary to help point out where the discrepancy is occurring in greater detail.

holtjma commented 1 year ago

Here is also a pre-view of the change in attached binary (this isn't an official release, it just has the change I mentioned earlier). If you re-run with this binary, you should get an output that indicates the discrepancy. Here's an example where I aligned and variant called with hg38 and then passed CHM13 reference to HiPhase:

...
[2023-05-31T17:47:59.269Z INFO  hiphase::data_types::reference_genome] Loading "reference/human_chm13v2.0_maskedY_rCRS.fasta"...
[2023-05-31T17:48:41.687Z INFO  hiphase::data_types::reference_genome] Finished loading 25 contigs.
[2023-05-31T17:48:41.919Z ERROR hiphase] Error while processing PhaseBlock { block_index: 0, coordinates: "chr1:10107-31294", num_variants: 63, sample_name: "HG001" }:
[2023-05-31T17:48:41.919Z ERROR hiphase]   Reference mismatch error: variant at chr1:10108 has REF allele = "C", but reference genome has "T".

hiphase-v0.10.1-x86_64-unknown-linux-gnu.tar.gz

linlin-coder commented 1 year ago

Thank you for your enthusiastic help. I think it may be due to a mismatch between the reference sequence and the VCF file. I will also try the new program later. Thank you for your help. In addition, is it possible for hiphase to consider phasing the trio family? Whatshap can support SNV and InDel phasing, but no corresponding analysis plan has been provided under structural variation and complex structural variation conditions.

holtjma commented 1 year ago

Based on the snippet I got via email (looks like maybe deleted here?), there is a reference mismatch happening. The command you posted above used hg38 (hg38/Homo_sapiens_assembly38.onlychr.fasta) whereas the snippet from the VCF was using hg19 (ucsc.hg19.fasta.onlychr.fasta). These are definitely incompatible from a reference perspective. HiPhase needs all alignment files, variant calls file, and reference fasta to use the same reference file or you'll likely hit this error again.

As for the trio question, the general idea of pedigree-backed phasing is in our backlog of possible extensions. As of right now, HiPhase is focused on pure read-backed phasing for an individual dataset, so even if you phase a trio with one command (e.g. https://github.com/PacificBiosciences/HiPhase/blob/main/docs/user_guide.md#multi-sample-vcfs) it will not actually leverage any pedigree information while phasing. We may add this in the future, but I don't have a timeline for you.

Based on the comments and figure via email, I think this problem is resolved and we will have better error messaging around reference mismatch in the next release. Closing it for now, but feel free to re-open if the issue persists.

linlin-coder commented 1 year ago

Thank you for your help. I will continue to follow with interest the development of this project