ekg / seqwish

alignment to variation graph inducer
MIT License
143 stars 18 forks source link

seqwish::compact error: found 0 overlaps for seq #92

Closed cgroza closed 2 years ago

cgroza commented 2 years ago

Hi,

I am getting an error when running seqwish (as part of the pggb pipeline) on about 73 human X chromosomes. Here is the last 20 lines of the logs:

[seqwish::transclosure] 32603.254 99.91% 14955669080-15038541650 dset_compression
[seqwish::transclosure] 32603.313 99.91% 14955669080-15038541650 dset_sort
[seqwish::transclosure] 32603.403 99.91% 14955669080-15038541650 dset_invert
[seqwish::transclosure] 32603.497 99.91% 14955669080-15038541650 graph_emission
[seqwish::transclosure] 32619.716 99.97% 15038541650-15873365003 overlap_collect
[seqwish::transclosure] 32623.221 99.97% 15038541650-15873365003 rank_build
[seqwish::transclosure] 32625.404 99.97% 15038541650-15873365003 parallel_union_find
[seqwish::transclosure] 32625.434 99.97% 15038541650-15873365003 dset_write
[seqwish::transclosure] 32625.463 99.97% 15038541650-15873365003 dset_compression
[seqwish::transclosure] 32625.506 99.97% 15038541650-15873365003 dset_sort
[seqwish::transclosure] 32625.568 99.97% 15038541650-15873365003 dset_invert
[seqwish::transclosure] 32625.631 99.97% 15038541650-15873365003 graph_emission
[seqwish::transclosure] 32632.856 100.00% building node_iitree and path_iitree indexes
[seqwish::transclosure] 35002.518 100.00% done
[seqwish::transclosure] 35002.524 done with transitive closures
[seqwish::compact] 35002.524 compacting nodes
[seqwish::compact] error: found 0 overlaps for seq 5005-01#hap1#h1tg000083l idx 9 at j=75636626 of 90929130
Command exited with non-zero status 1
seqwish -t 64 -s by_chrom/chrX.fa.gz -p chrX_out/chrX.fa.gz.30796e0.wfmash.paf -k 311 -g chrX_out/chrX.fa.gz.30796e0.4030258.seqwish.gfa -B 10000000 -P
90880.05s user 12435.50s system 295% cpu 35002.84s total 85348568Kb max memory

The inputs and alignments were processed from start to finish with wfmash as part of the pggb pipeline. What could be causing this issue and what would a possible fix look like?

AndreaGuarracino commented 2 years ago

Hi @cgroza,

it seems there is at least one base that is not covered in the graph.

Any chance to share the input FASTA and PAF files? They might be quite big, but it would be very helpful to reproduce the bug.

cgroza commented 2 years ago

Not sure I can legally share the sequence, but would the FASTA index (describing contigs and their length) and the PAF file be useful?

AndreaGuarracino commented 2 years ago

Yes, please, better than nothing, at least we can try, thank you!

AndreaGuarracino commented 2 years ago

The problem didn't pop up again.