ban-m / jtk

JTK -- a regional diploid genome assembler
MIT License
23 stars 2 forks source link

panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9 #10

Open dwbellott opened 2 weeks ago

dwbellott commented 2 weeks ago

I was able to run jtk successfully with the example reads, but I ran into an issue using my own nanopore reads.

Here is the tail of the output, running with verbose = 2

[2024-10-11T12:04:33Z DEBUG haplotyper::phmm_likelihood_correction] ARI 3774    2   1.000   1.000
[2024-10-11T12:04:33Z DEBUG haplotyper::phmm_likelihood_correction] ARI 3775    1   0.000   0.000
[2024-10-11T12:04:33Z DEBUG haplotyper::phmm_likelihood_correction] ARI 3776    2   1.000   1.000
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 6 but the index is 6
[2024-10-11T12:04:33Z DEBUG haplotyper::phmm_likelihood_correction] ARI 3788    2   1.000   1.000
[2024-10-11T12:04:33Z DEBUG haplotyper::phmm_likelihood_correction] ARI 3790    2   1.000   1.000
[2024-10-11T12:04:33Z DEBUG haplotyper::phmm_likelihood_correction] ARI 3794    2   1.000   1.000
[2024-10-11T12:04:33Z DEBUG haplotyper::phmm_likelihood_correction] ARI 3797    2   1.000   1.000
[2024-10-11T12:04:33Z DEBUG haplotyper::phmm_likelihood_correction] ARI 3811    1   0.000   0.000

jtk panicked at this same point 20 times before it failed (although I am running with threads = 1) . If I just grep for 'panic' in the output I can see:

thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 7 but the index is 7
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 7 but the index is 7
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 5 but the index is 5
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 6 but the index is 6
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 5 but the index is 5
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 7 but the index is 7
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 5 but the index is 5
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 5 but the index is 5
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 7 but the index is 7
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 7 but the index is 7
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 6 but the index is 6
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 5 but the index is 5
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 5 but the index is 5
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 6 but the index is 6
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 7 but the index is 7
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 6 but the index is 6
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 5 but the index is 5
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 5 but the index is 5
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 7 but the index is 7
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 6 but the index is 6

Following issue #8, I checked to see if I had any duplicate read names, but I do not.

Any idea what is causing this?

Thanks!

ban-m commented 1 week ago

Thanks for reporting bugs! It seems that the correction/polishing module went wrong (might be due to short tandem repeats -- just my guess).

For a quick fix, could you exec jtk with to_polish = false in your config.toml ?

Since the polishing step never changes the topology of the graph, if the resulting graph is too complicated/having many short nodes, I guess jtk could not handle the dataset (regardless of the polishing module panics or not).

dwbellott commented 1 week ago

Thanks for the quick response, I tried what you suggested, but ended up getting the same error:

[2024-10-16T06:20:26Z DEBUG haplotyper::phmm_likelihood_correction] ARI 3782    2   1.000   1.000
[2024-10-16T06:20:26Z DEBUG haplotyper::phmm_likelihood_correction] ARI 3786    2   1.000   1.000
[2024-10-16T06:20:26Z DEBUG haplotyper::phmm_likelihood_correction] ARI 3789    1   0.000   0.000
[2024-10-16T06:20:26Z DEBUG haplotyper::phmm_likelihood_correction] ARI 3793    2   1.000   1.000
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 6 but the index is 6
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 7 but the index is 7
[2024-10-16T06:20:26Z DEBUG haplotyper::phmm_likelihood_correction] ARI 3814    1   0.000   0.000
[2024-10-16T06:20:26Z DEBUG haplotyper::phmm_likelihood_correction] ARI 3815    2   1.000   1.000
ban-m commented 1 week ago

Thanks for the clarification. I will dig this issue on this weekend

dwbellott commented 1 week ago

A little update -- I did manage to get jtk to run on a smaller dataset (only spanning 2Mb instead of 10Mb), so whatever is causing the panics is probably related to dataset size. You may want to adjust your recommendations to users.

I did notice that the resulting haplotypes are not as complete as I might have hoped -- my read n50 is about 100kb, and the n50 for the assembled haplotypes is about 200kb, the longest haplotype is 459kb, my longest read in the input data is 591kb. Loading up both the reads and the assembled contigs in IGV, I can see that the resulting haplotypes are missing about half the heterozygous SNVs, and about half of the SNVs that are present in the assemblies are mistakenly called as homozygous.

Are there some settings I can tweak to increase the sensitivity to SNVs (and perhaps the contiguity as well)?

ban-m commented 5 days ago

Thanks for the information. Also sorry to hear that essentially our program did not "assemble" anything useful on your data.

On my side, I am still investigating the situation. I re-run the latest build on the Chr1:10M-15M nt region & the MHC region of the HG002 to see that jtk fully resolved these regions. The coverage or the complexity of the region might be the source of issue....

If you have time, could you use the latest build on your data with verbose = 2, and tell me the full log? You can use it either

  1. docker run --rm public.ecr.aws/r1e4j1j8/jtk:ac8d6ca2c055cfa2839258aaacde78960d51946c (recommended)
  2. git clone -b refactor https://github.com/ban-m/jtk.git && cd jtk && cargo build --release && ./target/release/jtk