Closed AndreaGuarracino closed 2 years ago
HPRC chr1 dataset (88 full haplotypes from 44 individuals plus 2 references)
\time -v seqwish -s chr1.pan.fa -p chr1.pan.paf.gz -g graph.gfa -t 48 -B 10M
master Elapsed (wall clock) time (h:mm:ss or m:ss): 3:43:23
branch Elapsed (wall clock) time (h:mm:ss or m:ss): 2:44:33
--- ~26% faster
🔥🔥🔥
On Tue, Nov 16, 2021, 08:01 Andrea Guarracino @.***> wrote:
HPRC chr1 dataset (88 full haplotypes from 44 individuals plus 2 references) \time -v seqwish -s chr1.pan.fa -p chr1.pan.paf.gz -g graph.gfa -t 48 -B 10M
master Elapsed (wall clock) time (h:mm:ss or m:ss): 3:43:23 branch Elapsed (wall clock) time (h:mm:ss or m:ss): 2:44:33 --- ~26% faster
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/pull/88#issuecomment-969933928, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEJESTLMXUUSB5BQ773UMH6UVANCNFSM5ICC4MJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Testing the branch on all human chromosomes, it gives us identical graphs but it is ~30/35% faster on average.
This parallelizes the two steps that have become the bottlenecks of the whole process, that it the mapping of the collection of bases from the full sequence set into a dense range (needed for the union-find algorithm to work on). Parallelization makes the two steps ~2.5X faster when tested with 16 threads on little/medium graphs, which results in an entire graph induction runtime improvements of ~40% with 16 threads.
chr16 mini dataset (12 full haplotypes from 6 humans plus 2 references)
\time -v seqwish -s chr16hg00.fa.gz -p chr16hg00.paf -g graph.gfa -t 16 -B 1M -P
master
Elapsed (wall clock) time (h:mm:ss or m:ss): 13:08.08
branchElapsed (wall clock) time (h:mm:ss or m:ss): 7:44.57
--- ~40% faster