ekg / seqwish

alignment to variation graph inducer
MIT License
143 stars 18 forks source link

Parallelize the mapping of collection of bases from the full sequence set into a dense range #88

Closed AndreaGuarracino closed 2 years ago

AndreaGuarracino commented 2 years ago

This parallelizes the two steps that have become the bottlenecks of the whole process, that it the mapping of the collection of bases from the full sequence set into a dense range (needed for the union-find algorithm to work on). Parallelization makes the two steps ~2.5X faster when tested with 16 threads on little/medium graphs, which results in an entire graph induction runtime improvements of ~40% with 16 threads.

chr16 mini dataset (12 full haplotypes from 6 humans plus 2 references) \time -v seqwish -s chr16hg00.fa.gz -p chr16hg00.paf -g graph.gfa -t 16 -B 1M -P

master Elapsed (wall clock) time (h:mm:ss or m:ss): 13:08.08 branch Elapsed (wall clock) time (h:mm:ss or m:ss): 7:44.57 --- ~40% faster

AndreaGuarracino commented 2 years ago

HPRC chr1 dataset (88 full haplotypes from 44 individuals plus 2 references) \time -v seqwish -s chr1.pan.fa -p chr1.pan.paf.gz -g graph.gfa -t 48 -B 10M

master Elapsed (wall clock) time (h:mm:ss or m:ss): 3:43:23 branch Elapsed (wall clock) time (h:mm:ss or m:ss): 2:44:33 --- ~26% faster

ekg commented 2 years ago

🔥🔥🔥

On Tue, Nov 16, 2021, 08:01 Andrea Guarracino @.***> wrote:

HPRC chr1 dataset (88 full haplotypes from 44 individuals plus 2 references) \time -v seqwish -s chr1.pan.fa -p chr1.pan.paf.gz -g graph.gfa -t 48 -B 10M

master Elapsed (wall clock) time (h:mm:ss or m:ss): 3:43:23 branch Elapsed (wall clock) time (h:mm:ss or m:ss): 2:44:33 --- ~26% faster

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/pull/88#issuecomment-969933928, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEJESTLMXUUSB5BQ773UMH6UVANCNFSM5ICC4MJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

AndreaGuarracino commented 2 years ago

Testing the branch on all human chromosomes, it gives us identical graphs but it is ~30/35% faster on average.