Open dwbellott opened 2 weeks ago
Thanks for reporting bugs! It seems that the correction/polishing module went wrong (might be due to short tandem repeats -- just my guess).
For a quick fix, could you exec jtk
with to_polish = false
in your config.toml
?
Since the polishing step never changes the topology of the graph, if the resulting graph is too complicated/having many short nodes, I guess jtk
could not handle the dataset (regardless of the polishing module panics or not).
Thanks for the quick response, I tried what you suggested, but ended up getting the same error:
[2024-10-16T06:20:26Z DEBUG haplotyper::phmm_likelihood_correction] ARI 3782 2 1.000 1.000
[2024-10-16T06:20:26Z DEBUG haplotyper::phmm_likelihood_correction] ARI 3786 2 1.000 1.000
[2024-10-16T06:20:26Z DEBUG haplotyper::phmm_likelihood_correction] ARI 3789 1 0.000 0.000
[2024-10-16T06:20:26Z DEBUG haplotyper::phmm_likelihood_correction] ARI 3793 2 1.000 1.000
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 6 but the index is 6
thread '<unnamed>' panicked at haplotyper/src/phmm_likelihood_correction.rs:353:9:
index out of bounds: the len is 7 but the index is 7
[2024-10-16T06:20:26Z DEBUG haplotyper::phmm_likelihood_correction] ARI 3814 1 0.000 0.000
[2024-10-16T06:20:26Z DEBUG haplotyper::phmm_likelihood_correction] ARI 3815 2 1.000 1.000
Thanks for the clarification. I will dig this issue on this weekend
A little update -- I did manage to get jtk to run on a smaller dataset (only spanning 2Mb instead of 10Mb), so whatever is causing the panics is probably related to dataset size. You may want to adjust your recommendations to users.
I did notice that the resulting haplotypes are not as complete as I might have hoped -- my read n50 is about 100kb, and the n50 for the assembled haplotypes is about 200kb, the longest haplotype is 459kb, my longest read in the input data is 591kb. Loading up both the reads and the assembled contigs in IGV, I can see that the resulting haplotypes are missing about half the heterozygous SNVs, and about half of the SNVs that are present in the assemblies are mistakenly called as homozygous.
Are there some settings I can tweak to increase the sensitivity to SNVs (and perhaps the contiguity as well)?
Thanks for the information. Also sorry to hear that essentially our program did not "assemble" anything useful on your data.
On my side, I am still investigating the situation. I re-run the latest build on the Chr1:10M-15M nt region & the MHC region of the HG002 to see that jtk
fully resolved these regions. The coverage or the complexity of the region might be the source of issue....
If you have time, could you use the latest build on your data with verbose = 2
, and tell me the full log? You can use it either
docker run --rm public.ecr.aws/r1e4j1j8/jtk:ac8d6ca2c055cfa2839258aaacde78960d51946c
(recommended)git clone -b refactor https://github.com/ban-m/jtk.git && cd jtk && cargo build --release && ./target/release/jtk
I was able to run jtk successfully with the example reads, but I ran into an issue using my own nanopore reads.
Here is the tail of the output, running with
verbose = 2
jtk panicked at this same point 20 times before it failed (although I am running with
threads = 1
) . If I just grep for 'panic' in the output I can see:Following issue #8, I checked to see if I had any duplicate read names, but I do not.
Any idea what is causing this?
Thanks!