This PR makes several changes to reduce the memory required by guide_search.GuideSearcher's memoization of designed guides (i.e., in _memoized_guides), as well as the runtime associated with it:
Changes the data structure _memoized_guides to map key -> start position -> result, rather than start position -> key -> result (4165b8a).
Avoids having to hash key in _construct_guide_memoized() when the most recent call already hashed it (c90c96c).
Compresses collections of indices, stored in _memoized_guides, that are mostly contiguous (c65a445).
Only memoizes results for guides that are constructed across a sufficiently high fraction of sequences (cff11b4).
Does not memoize "guides" during primer design, which was completely unnecessary (18d1087).
Periodically resizes the data structure _memoized_guides (9b5e3ad).
Adds an option to better take advantage of memoization on diverse inputs, improving runtime (2d4e340 and 121f0a3).
Two of these changes will decrease runtime (c90c96c and 2d4e340), but the rest are likely to slightly increase it.
The changes decrease memory as expected and, on large numbers of diverse input, also decrease runtime in net. The changes are considerable. Below are benchmark results for design on 48,108 sequences of IAV segment 6 (42,399 after curation/clustering):
After this PR (at commit 526676c):
Memory (peak): 14.10 GB
Memory (median over time): 0.56 GB
Runtime (design only): 65,220 sec (18 hr)
Runtime (total/end-to-end): 116,666 sec (32 hr)
Before this PR (at commit 0c69527):
Memory (peak): >354 GB
Memory (median over time): >120 GB
I stopped this benchmark after 133,143 sec when memory was still rising and design was only in its early stages.
This PR makes several changes to reduce the memory required by
guide_search.GuideSearcher
's memoization of designed guides (i.e., in_memoized_guides
), as well as the runtime associated with it:_memoized_guides
to mapkey
->start position
-> result, rather thanstart position
->key
-> result (4165b8a).key
in_construct_guide_memoized()
when the most recent call already hashed it (c90c96c)._memoized_guides
, that are mostly contiguous (c65a445)._memoized_guides
(9b5e3ad).Two of these changes will decrease runtime (c90c96c and 2d4e340), but the rest are likely to slightly increase it.
The changes decrease memory as expected and, on large numbers of diverse input, also decrease runtime in net. The changes are considerable. Below are benchmark results for design on 48,108 sequences of IAV segment 6 (42,399 after curation/clustering):